Navigation

 ·   Wiki Home
 ·   Data Processing
 ·   Hemileia vastatrix
 ·   Hypothenemus hampei
 ·   Coffea
 ·   Beauveria bassiana
 ·  
 ·   Title List
 ·   Uncategorized Pages
 ·   Random Page
 ·   Recent Changes
 ·   Wiki Help
 ·   What Links Here

Active Members:

Search:

 

Create or Find Page:

 

View Assembly of All Beauveria Concatening Reads

Concatenating Reads

/data/process/Beauveria/Assemblies/BbAssemblyReadsCLC$ cat ../../TrimDataBeauveria/Bb9001_trim_NoDuplicates.fasta ../../TrimDataBeauveria/Bb9024_trim_NoDuplicates.fasta ../../TrimDataBeauveria/Bb9119_trim_NoDuplicates.fasta ../../TrimDataBeauveria/Bb9205_trim_NoDuplicates.fasta > BbAllReads_trim_NoDuplicates.fasta

Assembly All Contigs

time clc_novo_assemble -q -p fb ss 100 400 BbAllReads_trim_NoDuplicates.fasta -o BbAllReads_trim_NoDuplicates_CLC.fasta
Progress: 100.0 %
real 148m44.888s
user 337m14.270s
sys 0m53.080s

sequence_info -n -r BbAllReads_trim_NoDuplicates_CLC.fasta 

File                           BbAllReads_trim_NoDuplicates_CLC.fasta

Number of sequences                 27328

Residue counts:
  Number of A's                  11950983   25.40 %
  Number of C's                  11295877   24.01 %
  Number of G's                  11285779   23.99 %
  Number of T's                  11941882   25.38 %
  Number of N's                    569030    1.21 %
  Total                          47043551

Sequence lengths:
  Minimum                             200
  Maximum                          234937
  Average                            1721.44
  N50                               11762

time clc_ref_assemble_long -q -p fb ss 100 400 BbAllReads_trim_NoDuplicates.fasta -d BbAllReads_trim_NoDuplicates_CLC.fasta -o BbAllReads_trim_NoDuplicates_CLC.fasta.cas
Progress: 100.0 %
real 46m23.522s
user 93m56.480s
sys 1m1.580s

assembly_info -p fb ss 100 400 BbAllReads_trim_NoDuplicates_CLC.fasta.cas > BbAllReads_trim_NoDuplicates_CLC.fasta.cas.txt

less BbAllReads_trim_NoDuplicates_CLC.fasta.cas.txt 

General info:

  Program name         clc_ref_assemble_long
  Program version      4.01beta.59919
  Program parameters   -q -p fb ss 100 400 BbAllReads_trim_NoDuplicates.fasta -d BbAllReads_trim_NoDuplicates_CLC.fasta -o BbAllReads_trim_NoDuplicates_CLC.fasta.cas

  Contig files:
    BbAllReads_trim_NoDuplicates_CLC.fasta [ 27328 / 47043551 ]

  Read files:
    BbAllReads_trim_NoDuplicates.fasta [ 122842984 / 10721226169 ] <paired>

Read info:

  Contigs                         27328
  Reads                       122842984
    Unmapped reads              1987626    1.62 %
    Mapped reads              120855358   98.38 %
      Multi hit reads           1249880    1.03 %
    Paired                    102067266   83.09 %
    Unpaired                   20775718   16.91 %

Paired end info:

  Paired reads                102067296   83.09 %
    Average distance                263.54
    99.9 % of pairs between         101 - 398
    99.0 % of pairs between         109 - 385
    95.0 % of pairs between         132 - 358

  Unpaired reads               20775688   16.91 %
    Both seqs not matching       279876    1.35 %
    One seq not mathing         3415500   16.44 %
    Both seqs matching         17080312   82.21 %
      Different contigs        15061532   88.18 %
      Wrong directions          1408976    8.25 %
      Too close                  299946    1.76 %
      Too far                    309858    1.81 %

Coverage info:

  Mapped nucleotides        10343442828   96.48 %
  Total sites                  47043551
  Average coverage                  219.87

R Statistics

a<-read.table(“BbAllReads_trim_NoDuplicates_CLC.fasta.cas.txt”, header=T)
View(a)
summary(a)

    Contig          Sites            Reads            Coverage       
 Min.   :    1   Min.   :   200   Min.   :      1   Min.   :    0.36  
 1st Qu.: 6833   1st Qu.:   292   1st Qu.:     20   1st Qu.:    4.36  
 Median :13664   Median :   428   Median :     53   Median :    6.93  
 Mean   :13664   Mean   :  1721   Mean   :   4422   Mean   :  116.71  
 3rd Qu.:20496   3rd Qu.:   783   3rd Qu.:    726   3rd Qu.:  119.16  
 Max.   :27328   Max.   :234937   Max.   :9703805   Max.   :32312.62  

ReadsInContigsBbAllReads.png

Contigs Size

m=ggplot(a,aes(x=Sites))
m + geom_histogram(aes(x=Sites),binwidth=70)+xlab(“Contig Size”)+xlim(0,5000)

ContigSizeBbAllReads.png

Coverage per Contig

m=ggplot(a,aes(x=Coverage))
m + geom_histogram(aes(x=Coverage),binwidth=0.3) + xlab(“Coverage per contig”) + xlim(0,200)

coveragexcontigAllBb.png

Coverage Representation

newDataFrame=merge(as.data.frame(categoriesCoverage),as.data.frame(a))
categoriesCoverage=add_labelsFromCoverage(a)
newDataFrame=merge(as.data.frame(categoriesCoverage),as.data.frame(a))
hist_cut = ggplot(newDataFrame, aes(x=Sites, fill=CoverageCategory))
hist_cut + geom_bar(position=“fill”,binwidth=150) + xlab(“Contig Size”) + ylab(“Proportion of Coverage Category”)

CoverageRepresentAllBb.png