Navigation

 ·   Wiki Home
 ·   Data Processing
 ·   Hemileia vastatrix
 ·   Hypothenemus hampei
 ·   Coffea
 ·   Beauveria bassiana
 ·  
 ·   Title List
 ·   Uncategorized Pages
 ·   Random Page
 ·   Recent Changes
 ·   Wiki Help
 ·   What Links Here

Active Members:

Search:

 

Create or Find Page:

 

View Mira and CLC Assembly 454 Reads- HvCat Illumina

Working dir: bio@ticuna:/data/process/Roya/assemblies/miraAssembly

HvCat Quality trim

To assembly 454 reads and HvCat illumina reads is necesary trim HvCat data, because 454 data is clean.

bio@ticuna:/data/process/Roya/cleanData$ quality_trim -p HvCat_Trim.fastq -r -i HvCat_1.fastq HvCat_2.fastq
Input reads: 48396016
Input residues: 5323561760
Output reads: 43704716 90.31 %
Output residues: 4668264337 87.69 %
Quality range: 2 to 40

Removing Duplicates

remove_duplicates -r -p HvCat_Trim.fastq -o HvCat_Trim_NoDuplicates.fastq
Progress: 100.0 %

CLC Hybrid Assembly 454-Illumina

time clc_novo_assemble -o CLCAssemblyAll454HvCatIllum.fasta -q ../../cleanData/RoyaTotal9sff_trim_seqclean_mdust_noDuplicates.fasta -q -p fb ss 200 400 ../../cleanData/HvCat_Trim_NoDuplicates.fastq
Progress: 100.0 %
real 1022m2.848s
user 1152m42.920s
sys 1m14.200s

sequence_info -n -r CLCAssemblyAll454HvCatIllum.fasta

File                           CLCAssemblyAll454HvCatIllum.fasta

Number of sequences                254645

Residue counts:
  Number of A's                  53429695   32.69 %
  Number of C's                  26690995   16.33 %
  Number of G's                  27020071   16.53 %
  Number of T's                  53547077   32.77 %
  Number of N's                   2732264    1.67 %
  Total                         163420102

Sequence lengths:
  Minimum                             200
  Maximum                           44736
  Average                             641.76
  N50                                 925

MIRA Hybrid Assembly 454-Illumina

Creating symbolic links to avoid copy or move data

ln -s ../../cleanData/RoyaTotal9sff_trim_seqclean_mdust_noDuplicates.fasta miraHvTotal9sffHvCatIllum_in.454.fasta

ln -s ../../cleanData/HvCat_Trim_NoDuplicates.fastq miraHvTotal9sffHvCatIllum_in.solexa.fastq

Assembly Strategy 1:

Steep 1: Assembly 454 reads and then map Illumina reads. Started at 03-03-2012:09am

mira -project=miraHvTotal9sff -job=denovo,genome,accurate,454 -GENERAL:number_of_threads=2 > miraHvTotal9sff.log
Finished 21-03-2012

sequence_info -n -r miraHvTotal9sff_out.unpadded.fasta

File                           miraHvTotal9sff_out.unpadded.fasta

Number of sequences                167996

Residue counts:
  Number of A's                  53730726   32.82 %
  Number of C's                  27880040   17.03 %
  Number of G's                  28190133   17.22 %
  Number of T's                  53742402   32.82 %
  Number of N's                    192384    0.12 %
  Total                         163735685

Sequence lengths:
  Minimum                              40
  Maximum                          177530
  Average                             974.64
  N50                                1033

Steep 2: filter the results

This step fetches ‘long’ contigs from the assembly before. Idea is to get all contigs larger than 500 bases.

/opt/mira_3.4.0/bin/convert_project -f caf -t caf -x 500 miraHvTotal9sff_assembly/miraHvTotal9sff_d_results/miraHvTotal9sff_out.caf hybrid_backbone_in

ln -s ../../cleanData/HvCat_Trim.fastq hybrid_in.solexa.fastq

Steep 3: Map solexa reads to 454 long reads assembly

time mira —project=hybrid —job=mapping,genome,accurate,solexa -AS:nop=1 -SB:bft=caf > log_HybridAssembly.txt SOLEXA_SETTINGS -CO:msr=no -GE:uti=no:tismin=80:tismax=180 -SB:ads=yes:dsn=hybrid > log_HybridAssembly.txt
tcmalloc: large alloc 10494050304 bytes 0x90753a000 tcmalloc: large alloc 14019903488 bytes 0×14e9aed000
tcmalloc: large alloc 10028249088 bytes 0x14e9aed000 tcmalloc: large alloc 13775097856 bytes 0×14e9aed000
tcmalloc: large alloc 9766612992 bytes 0x14e9aed000 tcmalloc: large alloc 17487106048 bytes 0×182ef59000
real 52277m13.242s
user 13907553m48.160s
sys 300491954m11.300s

Output error: File:log HybridAssembly.txt.zip