View ESTScan Protein Prediction
CarabicaBGItrim_CenicafeCArabica.fasta = 55554 Contigs
estscan using Solanum lycopersicon matrix = 54862 predicted proteins
CAP3 Assembly ESTScan using Arabidopsis matrix did not work with Coffee Unigene data set. We tried training the program with C.arabica data to produce a C.arabica matrix but, the process failed. We decided to use the Solanum lycopersicon matrix (le_mrna_nuclear.smat from old Lycopersicum esculentum species) produced at SGN by the group lead by Lukas Mueller at Cornell University because this matrix has given good results with Coffea data in the past.
We obtained 37,774 sequences >70aa from the initial set of 41,139 RNA-seq unigenes.
We obtained 46,531 sequences > 70aa with the hybrid assembly (CAP3, 51,561 predicted proteins) of Illumina RNA-seq + ESTs
BLASTp (1e-3) against Plant proteins downloaded from GeneBank gave 24,163 hits (63,8%)
BLASTx (1e-5) Carabica49 Cenicafe ESTs (40,038 Unigenes) CarabicaESTsvsCarabicapeps.bx = 29,376 hits (73,4%)
BLASTn (1e-5) CarabicaRNAseqUnigenes vs Carabica49 Cenicafe ESTs (40,038 Unigenes) = 24,473 hits (59,5%)