Navigation

 ·   Wiki Home
 ·   Data Processing
 ·   Hemileia vastatrix
 ·   Hypothenemus hampei
 ·   Coffea
 ·   Beauveria bassiana
 ·  
 ·   Title List
 ·   Uncategorized Pages
 ·   Random Page
 ·   Recent Changes
 ·   Wiki Help
 ·   What Links Here

Active Members:

Search:

 

Create or Find Page:

 

View David Notes

Monday, Jun 27, 2011

The next command was executed to copy the illumina quality trimmed files from biosge003 to biosge002:

root@biosge003:/home/dm-riano122/illumina/Limpias# rsync -avz *.fasta root@biosge002:/home/dm-riano122/Illumina/

Tuesday, Jun 28, 2011
454

First 454 data downloaded were descompresed in /home/dm-riano122/454/SFF/ :

@biosge003:/home/dm-riano122/454/SFF# for FILE in /biologia-scratch/SeqsRoya/SFF/LimpiezaUniandes/*.zip; do unzip $FILE; done

All genome 454 data size 14.5GB.

Then I continue with extraction.

biosge003:/home/dm-riano122/454/SFF# sff_extract_0_2_8 -c -Q * -o TodoRoya454

This command extract to fastq format, clipping adpators and bad quality specified by SFF file.All the files were joined in one file (TodoRoya.fastq). Below it is showing the output of the process:

Working on ‘2011_04_28_AlvaroGaitan_HvCat_7-1_SFF.sff’:
Converting ‘2011_04_28_AlvaroGaitan_HvCat_7-1_SFF.sff’ … done.
Converted 690238 reads into 690238 sequences.
Working on ‘2011_04_28_AlvaroGaitan_HvCat_7-2_SFF.sff’:
Converting ‘2011_04_28_AlvaroGaitan_HvCat_7-2_SFF.sff’ … done.
Converted 690061 reads into 690061 sequences.
Working on ‘2011_05_19_AlvaroGaitan_HvCat_7-1_SFF.sff’:
Converting ‘2011_05_19_AlvaroGaitan_HvCat_7-1_SFF.sff’ … done.
Converted 596783 reads into 596783 sequences.
Working on ‘2011_05_19_AlvaroGaitan_HvCat_7-2_SFF.sff’:
Converting ‘2011_05_19_AlvaroGaitan_HvCat_7-2_SFF.sff’ … done.
Converted 592580 reads into 592580 sequences.
Working on ‘2011_05_20_AlvaroGaitan_HvCat_7-1_SFF.sff’:
Converting ‘2011_05_20_AlvaroGaitan_HvCat_7-1_SFF.sff’ … done.
Converted 695141 reads into 695141 sequences.
Working on ‘2011_05_20_AlvaroGaitan_HvCat_7-2_SFF.sff’:
Converting ‘2011_05_20_AlvaroGaitan_HvCat_7-2_SFF.sff’ … done.
Converted 678516 reads into 678516 sequences.
Working on ‘Hv_2009.sff’:
Converting ‘Hv_2009.sff’ … done.
Converted 516834 reads into 516834 sequences.

The final file (TodoRoya.fastq) has 8597445 reads. FastQC was runing over this.

Then 454 reads (fasta ) and .qual was extracted:

bq.sff_extract_0_2_8 -c * -o TodoRoya

Over this file the trimming was done:

Input reads: 4460153
Input residues: 1728122092
Output reads: 2486886 55.76 %
Output residues: 732529492 42.39 %

Quality range: 0 to 40

Then assembly was done:

biosge003:/home/dm-riano122/454/CLC_Assembly# time clc_novo_assemble -q TodoRoya_trim.fasta -o TodoRoya_trim_CLC-Assembly.fasta —cpus 20&

The asembly finished suddenly. And error was show:

Fasta file format error in file (null)

I suspect that a problem was that the first trimming not discard reads without bases (blank entries).

Again a trimming was done, with the option -m 70 to discard reads shorter than 70:

time quality_trim -r TodoRoya.fasta -q TodoRoya.fasta.qual -o TodoRoya_trim.fasta -m 70&

Here the output:

Input reads: 4460153
Input residues: 1728122092

Output reads: 2343431 52.54 %
Output residues: 725435138 41.98 %

Quality range: 0 to 40

real 2m28.165s
user 2m23.965s
sys 0m3.916s

Again and assembly was done:

Illumina

With the trimmming data an assemble was done:

biosge002:/home/dm-riano122/Illumina# time clc_novo_assemble -o HvTotalIlluminaContigAssemblyCLC.fasta -q Hv387_trim.fasta Hv494_trim.fasta HvCat_4_trim.fasta HvDQ952_trim.fasta HvH_179_trim.fasta HvH_569_trim.fasta HvH_701_trim.fasta HvMar_1_trim.fasta -v -p fb ss 200 400 —cpus 20&