Building of CENICAFE Transcript Assemblies is based on Transcript Assemblies created at TIGR (http://plantta.tigr.org/). The sequences that are used to build the TAs are expressed transcripts collected from CENICAFE cDNA libraries (EST sequences). Unlike TIGR assemblies, sequences with no chromatograms and quality support are not included.
TAs are clustered and assembled using the TGICL tool (Pertea et al., 2003), Megablast (Zhang et al., 2000) and the CAP3 assembler (Huang and Madan, 1999). TGICL is a wrapper script which invokes Megablast and CAP3. Sequences are initially clustered based on an all-against-all comparisons using Megablast. The initial clusters are assembled to generate consensus sequences using CAP3. Assembly criteria include a 50 bp minimum match, 95% minimum identity in the overlap region and 20 bp maximum unmatched overhangs.
Any EST/cDNA sequences that are not assembled into TAs are included as singletons. Singletons would have GenBank accession numbers if they are deposited there. CENICAFE TA identifiers are of the form TAnumber_taxonID, where number is a unique numerical identifier of the transcript assembly and taxonID represents the NCBI taxon id.
In order to provide annotation for the TAs, each TA/singleton was aligned to a masked version of the UniProt Uniref100 database. Alignments were required to have at least 20% identity and 20% coverage. The annotation for the protein with the best alignment to each TA or singleton was used as the annotation for that sequence. Additionally, the relative orientation of each TA/singleton to the best matching protein sequence was used to determine the orientation of each TA/singleton. Some sequences did not have alignments to the protein database that met our quality criteria, and those sequences have neither annotation nor orientation assignments.
The release number for the plant TAs refers to the release version for a particular species. For the initial build, all TA sets are of version 1. Subsequent TA updates for new releases will be carried out when new EST and cDNA sequences are produced in Cenicafe projects.
Plant Transcript Assemblies Overview
Navigate the tree below to locate your species of interest. For the time being, we are showing C. canephora EST´S assemblies only. The EST´s were produced by Nestle and the Solanaceae Genomics Group at Cornell University. We will include the rest of the species displayed here in the near future.