BIOINFORMATICS AT CENICAFÉ
Bioinformatics is the application of computer science to the field of molecular biology. The term bioinformatics was coined by Paulien Hogeweg in 1979 for the study of informatic processes in biotic systems. Its primary use since at least the late 1980s has been in genomics and genetics, particularly in those areas of genomics involving large-scale DNA sequencing. Bioinformatics now entails the creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve formal and practical problems arising from the management and analysis of biological data. Over the past few decades rapid developments in genomic and other molecular research technologies and developments in information technologies have combined to produce a tremendous amount of information related to molecular biology. It is the name given to these mathematical and computing approaches used to glean understanding of biological processes. Common activities in bioinformatics include mapping and analyzing DNA and protein sequences, aligning different DNA and protein sequences to compare them and creating and viewing 3-D models of protein structures.
Genome assembly refers to the process of taking a large number of short DNA sequences, all of which were generated by a shotgun sequencing project, and putting them back together to create a representation of the original chromosomes from which the DNA originated. In a shotgun sequencing project, all the DNA from a source (usually a single organism, anything from a bacterium to a mammal) is first fractured into millions of small pieces. These pieces are then “read” by automated sequencing machines, which can read up to 900 nucleotides or bases at a time. (The four bases are adenine, guanine, cytosine, and thymine, represented as AGCT.) A genome assembly algorithm works by taking all the pieces and aligning them to one another, and detecting all places where two of the short sequences, or reads, overlap. These overlapping reads can be merged together, and the process continues.
Genome annotation is the process of attaching biological information to sequences. It consists of two main steps:
identifying elements on the genome, a process called gene prediction, and attaching biological information to these elements.
Automatic annotation tools try to perform all this by computer analysis, as opposed to manual annotation (a.k.a. curation) which involves human expertise. Ideally, these approaches co-exist and complement each other in the same annotation pipeline.
For data analysis and processing at Cenicafé we use the following machines:
Type Operative System Processor RAM Hard Disc Functionality – Application
Apple XServe Mac OS XServer 10.5.8 2 × 3.0 Ghz Quadcore Intel Xeon 4 Gb 1.78 TB InterProScan, Blast, SGE, orthomcl
Apple XServe Mac OS XServer 10.5.8 2 × 2.0 Ghz Dualcore Intel Xeon 4 Gb 1.78 TB InterProScan, Blast, SGE, orthomcl
IBM Xseries 346 Linux Suse 10 SP1 4 × 3.6 Ghz Intel Xeon 5 Gb 1.1 TB Web Server, DB Server, GBrowse, Blast, tgicl, Emboss, Bioperl
Sunfire-V240 Solaris 5.1 2 × 1.0 Ghz 4 Gb 140 GB Assembly
IBM e325 Ubuntu 8.10 2 × 2.4 Ghz AMD Opteron 5 Gb 65 GB Mapping Server, FPC
IBM e325 Ubuntu Server 2 × 2.4 Ghz AMD Opteron 5 Gb 65 GB 454 Server
IBM e325 Ubuntu Server 2 × 2.4 Ghz AMD Opteron 5 Gb 65 GB Test server
IBM e325 Ubuntu Desktop 9.10 32 b 2 × 2.4 Ghz AMD Opteron 5 Gb 65 GB Mira
IBM Xseries 345 Suse Enterprice Server 10 SP1 2 × 3.06 Ghz Intel Xeon 4 Gb 280 GB Public web server
Power Mac G5 Mac OS X 10.4.11 Dual 2.7 Ghz Power Mac G5 2.5 Gb 500 GB InterProScan, Blast, Codon Code Aligner
Mac Pro – Desktop Mac OS X 10.6.2 2 × 2.6 Ghz Quadcore Intel Xeon 10 Gb 1 TB Sequences Analysis.
iMac – Desktop Mac OS X 10.6.2 2 × 2.0 Ghz Dualcore Intel Xeon 4 Gb 1 TB DB Administration
iMac – Desktop Mac OS X 10.6.2 2 × 2.0 Ghz Dualcore Intel Xeon 4 Gb 1 TB Platform administration