View Third Hybrid Assembly of 454 and Illumina data with CLC

In the Second Hybrid Assembly of 454 and Illumina data with CLC is missing these files: HvCat_7-1_2011_7.sff, HvCat_7-2_2011_7.sff, for this third assembly include these files.

Extracting Sff:

Sff_extract -c HvCat_7* -o HvCat_7

Counting Reads

grep -v ‘>’ HvCat_7.fasta | wc
1400293 1400293 602104524

Quality Trim

/data/process/Roya/RawDataRoya/SffRoya# time /opt/CLC/clc-assembly-cell-4.0.1beta-linux_64/quality_trim -r HvCat_7.fasta -q HvCat_7.fasta.qual -m 70 -o HvCat_7_trim.fasta
Input reads: 1400293
Input residues: 600704231
Output reads: 832456 59.45 %
Output residues: 279812367 46.58 %
Quality range: 0 to 40
real 1m27.413s
user 1m4.070s
sys 0m3.850s

/opt/seqclean/seqclean HvCat_7_trim.fasta -m 70 -o HvCat_7_trim_seqclean.fasta -N -A
seqclean running options:
seqclean HvCat_7_trim.fasta -m 70 -o HvCat_7_trim_seqclean.fasta -N -A Standard log file: seqcl_HvCat_7_trim.fasta.log Error log file: err_seqcl_HvCat_7_trim.fasta.log Using 1 CPUs for cleaning Rebuilding HvCat_7_trim.fasta cdb index Launching actual cleaning process: psx -p 1 -n 1000 -i HvCat_7_trim.fasta -d cleaning -C ‘/data/process/Roya/RawDataRoya/SffRoya/HvCat_7_trim.fasta:LMS100:::11:0’ -c ‘/opt/seqclean/bin/seqclean.psx’
Collecting cleaning reports **************************************************
Sequences analyzed: 832456
—————————————————- valid: 817988 (0 trimmed) trashed: 14468 **************************************************
——= Trashing summary =——— by ‘short’: 14062 by ‘dust’: 406
Output file containing only valid and trimmed sequences: HvCat_7_trim_seqclean.fasta
For trimming and trashing details see cleaning report : HvCat_7_trim.fasta.cln
seqclean (HvCat_7_trim.fasta) finished on machine in /data/process/Roya/RawDataRoya/SffRoya, without a detectable error.

Delete low quality sequences

time /opt/tgicl_linux/bin/mdust HvCat_7_trim_seqclean.fasta > HvCat_7_trim_seqclean_mdust.fasta
real 9m33.119s
user 8m31.700s
sys 0m3.670s

Concatening previous clean data with HvCat_7

/data/process/Roya/CleanData# mv ../RawDataRoya/SffRoya/HvCat_7_trim_seqclean_mdust.fasta .

root@ticuna:/data/process/Roya/CleanData# ls

454RoyaTodas_trim_seqclean_Mdust.fasta HvCat_7_trim_seqclean_mdust.fasta Illumina_Trim_NoDuplicatesAgain_Mdust.fasta

root@ticuna:/data/process/Roya/CleanData# cat 454RoyaTodas_trim_seqclean_Mdust.fasta HvCat_7_trim_seqclean_mdust.fasta > RoyaTotal9sff_trim_seqclean_mdust.fasta

Deleting Duplicates

/opt/CLC/clc-assembly-cell-4.0.1beta-linux_64/remove_duplicates -p -r RoyaTotal9sff_trim_seqclean_mdust.fasta -o RoyaTotal9sff_trim_seqclean_mdust_noDuplicates.fasta

grep -c ‘>’ RoyaTotal9sff_trim_seqclean_mdust.fasta

grep -c ‘>’ RoyaTotal9sff_trim_seqclean_mdust_noDuplicates.fasta

Hybrid Assembly with CLC

/data/process/Roya/Ensamblajes/ThirdHybridAssemblyCLC# time clc_novo_assemble -o ThirdHybridAssemblyCLC.fasta -q ../../CleanData/RoyaTotal9sff_trim_seqclean_mdust_noDuplicates.fasta -q -p fb ss 200 400 ../../CleanData/Illumina_Trim_NoDuplicatesAgain_Mdust.fasta —cpus 4
Progress: 100.0 %
real 2194m48.471s
user 3372m57.780s
sys 9m21.610s

grep -c ‘>’ ThirdHybridAssemblyCLC.fasta