About CENICAFE Transcript Assemblies

Building of CENICAFE Transcript Assemblies is based on Transcript Assemblies created at TIGR (http://plantta.tigr.org/).  The sequences that are used to build the TAs are expressed transcripts collected   from CENICAFE cDNA libraries (EST sequences). Unlike TIGR assemblies, sequences   with no chromatograms and quality support are not included.

TAs are clustered and assembled using the TGICL tool (Pertea et al., 2003),   Megablast (Zhang et al., 2000) and the CAP3 assembler (Huang and Madan, 1999).   TGICL is a wrapper script which invokes Megablast and CAP3. Sequences are initially   clustered based on an all-against-all comparisons using Megablast. The initial   clusters are assembled to generate consensus sequences using CAP3. Assembly   criteria include a 50 bp minimum match, 95% minimum identity in the overlap   region and 20 bp maximum unmatched overhangs.

Any EST/cDNA sequences that are not assembled into TAs are included as singletons.   Singletons would have GenBank accession numbers if they are deposited there. CENICAFE TA identifiers are of the form TAnumber_taxonID, where number is a   unique numerical identifier of the transcript assembly and taxonID represents   the NCBI taxon id.

In order to provide annotation for the TAs, each TA/singleton was aligned to   a masked version of the UniProt Uniref100 database. Alignments were required   to have at least 20% identity and 20% coverage. The annotation for the protein   with the best alignment to each TA or singleton was used as the annotation for   that sequence. Additionally, the relative orientation of each TA/singleton to   the best matching protein sequence was used to determine the orientation of   each TA/singleton. Some sequences did not have alignments to the protein database   that met our quality criteria, and those sequences have neither annotation nor   orientation assignments.

The release number for the plant TAs refers to the release version for a particular   species. For the initial build, all TA sets are of version 1. Subsequent TA   updates for new releases will be carried out when new EST and cDNA sequences   are produced in Cenicafe projects.

Plant Transcript Assemblies Overview

Navigate the tree below to locate your species of interest. For the time being, we are showing C. canephora EST´S assemblies only. The EST´s were produced by Nestle and the Solanaceae Genomics Group at Cornell University. We will include the rest of the species displayed here in the near future.


Coffea arabica
Coffea liberica
Coffea canephora


Beauveria bassiana


         Hypothenemus hampei
Hypothenemus obscurus