·   Wiki Home
 ·   Data Processing
 ·   Hemileia vastatrix
 ·   Hypothenemus hampei
 ·   Coffea
 ·   Beauveria bassiana
 ·   Title List
 ·   Uncategorized Pages
 ·   Random Page
 ·   Recent Changes
 ·   Wiki Help
 ·   What Links Here

Active Members:



Create or Find Page:


View ESTScan Protein Prediction

CLC Assembly

CarabicaBGItrim_CenicafeCArabica.fasta = 55554 Contigs

estscan using Solanum lycopersicon matrix = 54862 predicted proteins

CAP3 Assembly ESTScan using Arabidopsis matrix did not work with Coffee Unigene data set. We tried training the program with C.arabica data to produce a C.arabica matrix but, the process failed. We decided to use the Solanum lycopersicon matrix (le_mrna_nuclear.smat from old Lycopersicum esculentum species) produced at SGN by the group lead by Lukas Mueller at Cornell University because this matrix has given good results with Coffea data in the past.

We obtained 37,774 sequences >70aa from the initial set of 41,139 RNA-seq unigenes.

We obtained 46,531 sequences > 70aa with the hybrid assembly (CAP3, 51,561 predicted proteins) of Illumina RNA-seq + ESTs

BLASTp (1e-3) against Plant proteins downloaded from GeneBank gave 24,163 hits (63,8%)
BLASTx (1e-5) Carabica49 Cenicafe ESTs (40,038 Unigenes) CarabicaESTsvsCarabicapeps.bx = 29,376 hits (73,4%)
BLASTn (1e-5) CarabicaRNAseqUnigenes vs Carabica49 Cenicafe ESTs (40,038 Unigenes) = 24,473 hits (59,5%)