View Assembly of Bb9119-Bb9205 Beauveria Reads
cat ../../TrimDataBeauveria/Bb9119_trim_NoDuplicates.fasta ../../TrimDataBeauveria/Bb9205_trim_NoDuplicates.fasta > Bb9119Bb9205Reads_trim_NoDuplicates.fasta
time clc_novo_assemble -q -p fb ss 100 400 Bb9119Bb9205Reads_trim_NoDuplicates.fasta -o Bb9119Bb9205_Reads_trim_NoDuplicates_CLC.fasta
Progress: 100.0 %
real 83m38.183s
user 140m59.870s
sys 0m15.870s
sequence_info -n -r Bb9119Bb9205_Reads_trim_NoDuplicates_CLC.fasta File Bb9119Bb9205_Reads_trim_NoDuplicates_CLC.fasta Number of sequences 5804 Residue counts: Number of A's 8665750 24.85 % Number of C's 8716109 25.00 % Number of G's 8713433 24.99 % Number of T's 8664887 24.85 % Number of N's 105096 0.30 % Total 34865275 Sequence lengths: Minimum 200 Maximum 125888 Average 6007.11 N50 24460
time clc_ref_assemble_long -q -p fb ss 100 400 Bb9119Bb9205Reads_trim_NoDuplicates.fasta -d Bb9119Bb9205_Reads_trim_NoDuplicates_CLC.fasta -o Bb9119Bb9205CLC.fasta.cas
Progress: 100.0 %
real 29m20.580s
user 41m57.960s
sys 0m25.540s
assembly_info -p fb ss 100 400 Bb9119Bb9205CLC.fasta.cas > Bb9119Bb9205CLC.fasta.cas.txt
General info: Program name clc_ref_assemble_long Program version 4.01beta.59919 Program parameters -q -p fb ss 100 400 Bb9119Bb9205Reads_trim_NoDuplicates.fasta -d Bb9119Bb9205_Reads_trim_NoDuplicates_CLC.fasta -o Bb9119Bb9205CLC.fasta.cas Contig files: Bb9119Bb9205_Reads_trim_NoDuplicates_CLC.fasta [ 5804 / 34865275 ] Read files: Bb9119Bb9205Reads_trim_NoDuplicates.fasta [ 57308770 / 4999825040 ] <paired> Read info: Contigs 5804 Reads 57308770 Unmapped reads 374256 0.65 % Mapped reads 56934514 99.35 % Multi hit reads 851323 1.50 % Paired 52428046 91.48 % Unpaired 4880724 8.52 % Paired end info: Paired reads 52428048 91.48 % Average distance 263.79 99.9 % of pairs between 101 - 398 99.0 % of pairs between 108 - 383 95.0 % of pairs between 131 - 357 Unpaired reads 4880722 8.52 % Both seqs not matching 116950 2.40 % One seq not mathing 514612 10.54 % Both seqs matching 4249160 87.06 % Different contigs 3063536 72.10 % Wrong directions 878644 20.68 % Too close 168208 3.96 % Too far 138772 3.27 % Coverage info: Mapped nucleotides 4909089732 98.19 % Total sites 34865275 Average coverage 140.80
Mapping ESTs Cenicafe with Assembly of Bb9119-Bb9205
/data/process/Beauveria/mapBbTranscriptomeToGenome$ /opt/blat/blat -t=dna ../Assemblies/AssemblyBb9119Bb9205ReadsCLC/Bb9119Bb9205ReadsAsmCLC.fasta -q=rna BbassianaR50Cenicafe.fasta -ooc=/opt/blat/11.ooc BbassianaR50Cenicafe_Blat_Bb9119Bb9205ReadsAsmCLC.fasta.psl
awk ‘{print $10}’ BbassianaR50Cenicafe_Blat_Bb9119Bb9205ReadsAsmCLC.fasta.psl | sort | uniq | wc -l
6954
File:BbassianaCeni Blat Bb9119Bb9205.zip
R Statistics
a<-read.table(“Bb9119Bb9205CLC.fasta.cas.txt”, header=T)
View(a)
summary(a)
Contig Sites Reads Coverage Min. : 1 Min. : 200.0 Min. : 2.0 Min. : 0.58 1st Qu.:1452 1st Qu.: 312.0 1st Qu.: 13.0 1st Qu.: 3.47 Median :2902 Median : 539.5 Median : 103.5 Median : 17.25 Mean :2902 Mean : 6007.1 Mean : 9809.5 Mean : 90.24 3rd Qu.:4353 3rd Qu.: 5974.0 3rd Qu.: 8535.2 3rd Qu.: 108.00 Max. :5804 Max. :125888.0 Max. :4840258.0 Max. :16053.00
m=ggplot(a,aes(Reads))
m + geom_histogram(aes(x=Reads),binwidth=1)+xlab(“Number of Reads in Contigs”)+xlim(0,430)
Contigs Size
m=ggplot(a,aes(x=Sites))
m + geom_histogram(aes(x=Sites),binwidth=70)+xlab(“Contig Size”)+xlim(0,80000)
With Zoom
Coverage per Contig
m=ggplot(a,aes(x=Coverage))
m + geom_histogram(aes(x=Coverage),binwidth=0.3) + xlab(“Coverage per contig”) + xlim(0,200)
Coverage Representation
newDataFrame=merge(as.data.frame(categoriesCoverage),as.data.frame(a))
categoriesCoverage=add_labelsFromCoverage(a)
newDataFrame=merge(as.data.frame(categoriesCoverage),as.data.frame(a))
hist_cut = ggplot(newDataFrame, aes(x=Sites, fill=CoverageCategory))
hist_cut + geom_bar(position=“fill”,binwidth=150) + xlab(“Contig Size”) + ylab(“Proportion of Coverage Category”)