Navigation

 ·   Wiki Home
 ·   Data Processing
 ·   Hemileia vastatrix
 ·   Hypothenemus hampei
 ·   Coffea
 ·   Beauveria bassiana
 ·  
 ·   Title List
 ·   Uncategorized Pages
 ·   Random Page
 ·   Recent Changes
 ·   Wiki Help
 ·   What Links Here

Active Members:

Search:

 

Create or Find Page:

 

View Assembly of Bb9119-Bb9205 Beauveria Reads

cat ../../TrimDataBeauveria/Bb9119_trim_NoDuplicates.fasta ../../TrimDataBeauveria/Bb9205_trim_NoDuplicates.fasta > Bb9119Bb9205Reads_trim_NoDuplicates.fasta

time clc_novo_assemble -q -p fb ss 100 400 Bb9119Bb9205Reads_trim_NoDuplicates.fasta -o Bb9119Bb9205_Reads_trim_NoDuplicates_CLC.fasta
Progress: 100.0 %
real 83m38.183s
user 140m59.870s
sys 0m15.870s

sequence_info -n -r Bb9119Bb9205_Reads_trim_NoDuplicates_CLC.fasta 

File                           Bb9119Bb9205_Reads_trim_NoDuplicates_CLC.fasta

Number of sequences                  5804

Residue counts:
  Number of A's                   8665750   24.85 %
  Number of C's                   8716109   25.00 %
  Number of G's                   8713433   24.99 %
  Number of T's                   8664887   24.85 %
  Number of N's                    105096    0.30 %
  Total                          34865275

Sequence lengths:
  Minimum                             200
  Maximum                          125888
  Average                            6007.11
  N50                               24460

time clc_ref_assemble_long -q -p fb ss 100 400 Bb9119Bb9205Reads_trim_NoDuplicates.fasta -d Bb9119Bb9205_Reads_trim_NoDuplicates_CLC.fasta -o Bb9119Bb9205CLC.fasta.cas
Progress: 100.0 %
real 29m20.580s
user 41m57.960s
sys 0m25.540s

assembly_info -p fb ss 100 400 Bb9119Bb9205CLC.fasta.cas > Bb9119Bb9205CLC.fasta.cas.txt

General info:

  Program name         clc_ref_assemble_long
  Program version      4.01beta.59919
  Program parameters   -q -p fb ss 100 400 Bb9119Bb9205Reads_trim_NoDuplicates.fasta -d Bb9119Bb9205_Reads_trim_NoDuplicates_CLC.fasta -o Bb9119Bb9205CLC.fasta.cas

  Contig files:
    Bb9119Bb9205_Reads_trim_NoDuplicates_CLC.fasta [ 5804 / 34865275 ]

  Read files:
    Bb9119Bb9205Reads_trim_NoDuplicates.fasta [ 57308770 / 4999825040 ] <paired>

Read info:

  Contigs                          5804
  Reads                        57308770
    Unmapped reads               374256    0.65 %
    Mapped reads               56934514   99.35 %
      Multi hit reads            851323    1.50 %
    Paired                     52428046   91.48 %
    Unpaired                    4880724    8.52 %

Paired end info:

  Paired reads                 52428048   91.48 %
    Average distance                263.79
    99.9 % of pairs between         101 - 398
    99.0 % of pairs between         108 - 383
    95.0 % of pairs between         131 - 357

  Unpaired reads                4880722    8.52 %
    Both seqs not matching       116950    2.40 %
    One seq not mathing          514612   10.54 %
    Both seqs matching          4249160   87.06 %
      Different contigs         3063536   72.10 %
      Wrong directions           878644   20.68 %
      Too close                  168208    3.96 %
      Too far                    138772    3.27 %

Coverage info:

  Mapped nucleotides         4909089732   98.19 %
  Total sites                  34865275
  Average coverage                  140.80

Mapping ESTs Cenicafe with Assembly of Bb9119-Bb9205

/data/process/Beauveria/mapBbTranscriptomeToGenome$ /opt/blat/blat -t=dna ../Assemblies/AssemblyBb9119Bb9205ReadsCLC/Bb9119Bb9205ReadsAsmCLC.fasta -q=rna BbassianaR50Cenicafe.fasta -ooc=/opt/blat/11.ooc BbassianaR50Cenicafe_Blat_Bb9119Bb9205ReadsAsmCLC.fasta.psl

awk ‘{print $10}’ BbassianaR50Cenicafe_Blat_Bb9119Bb9205ReadsAsmCLC.fasta.psl | sort | uniq | wc -l
6954

File:BbassianaCeni Blat Bb9119Bb9205.zip

R Statistics

a<-read.table(“Bb9119Bb9205CLC.fasta.cas.txt”, header=T)
View(a)
summary(a)

Contig         Sites              Reads              Coverage       
 Min.   :   1   Min.   :   200.0   Min.   :      2.0   Min.   :    0.58  
 1st Qu.:1452   1st Qu.:   312.0   1st Qu.:     13.0   1st Qu.:    3.47  
 Median :2902   Median :   539.5   Median :    103.5   Median :   17.25  
 Mean   :2902   Mean   :  6007.1   Mean   :   9809.5   Mean   :   90.24  
 3rd Qu.:4353   3rd Qu.:  5974.0   3rd Qu.:   8535.2   3rd Qu.:  108.00  
 Max.   :5804   Max.   :125888.0   Max.   :4840258.0   Max.   :16053.00   

m=ggplot(a,aes(Reads))
m + geom_histogram(aes(x=Reads),binwidth=1)+xlab(“Number of Reads in Contigs”)+xlim(0,430)

ReadsBb.png

Contigs Size

m=ggplot(a,aes(x=Sites))
m + geom_histogram(aes(x=Sites),binwidth=70)+xlab(“Contig Size”)+xlim(0,80000)

RplotContigSize.png

With Zoom

RplotContigSizeZoom.png

Coverage per Contig

m=ggplot(a,aes(x=Coverage))
m + geom_histogram(aes(x=Coverage),binwidth=0.3) + xlab(“Coverage per contig”) + xlim(0,200)

CoveragePerContig.png

Coverage Representation

newDataFrame=merge(as.data.frame(categoriesCoverage),as.data.frame(a))
categoriesCoverage=add_labelsFromCoverage(a)
newDataFrame=merge(as.data.frame(categoriesCoverage),as.data.frame(a))
hist_cut = ggplot(newDataFrame, aes(x=Sites, fill=CoverageCategory))
hist_cut + geom_bar(position=“fill”,binwidth=150) + xlab(“Contig Size”) + ylab(“Proportion of Coverage Category”)

CoverageRepresentation.png