Navigation

 ·   Wiki Home
 ·   Data Processing
 ·   Hemileia vastatrix
 ·   Hypothenemus hampei
 ·   Coffea
 ·   Beauveria bassiana
 ·  
 ·   Title List
 ·   Uncategorized Pages
 ·   Random Page
 ·   Recent Changes
 ·   Wiki Help
 ·   What Links Here

Active Members:

Search:

 

Create or Find Page:

 

View Discovering novel genes and transcripts - Identifying differentially expressed and regulated genes

RNA-seQC

http://rna-seqblog.com/data-analysis/other-tools/rna-seqc-a-series-of-quality-control-metrics-for-rna-seq-data/

We installed but, could not make it work.

Cufflinks

Format reference genome with bowtie

bowtie-build ../cenicafe/rust/genome/ThirdHybridAssemblyCLC.fasta ThirdHybridAssemblyCLC
generating this files: ThirdHybridAssemblyCLC.1.ebwt ThirdHybridAssemblyCLC.2.ebwt ThirdHybridAssemblyCLC.3.ebwt ThirdHybridAssemblyCLC.4.ebwt ThirdHybridAssemblyCLC.fa ThirdHybridAssemblyCLC.rev.1.ebwt ThirdHybridAssemblyCLC.rev.2.ebwt

Map the reads for each strain to the reference genome

/opt/tophat-1.3.3/bin/tophat -r 100 -o tophat_Hv701 CLCAssemblyAll454HvCatIllum /data/rawData/rawDataRoya/RNASeqBGI/Hv701_L1_1.fq /data/rawData/rawDataRoya/RNASeqBGI/Hv701_L1_2.fq

/opt/tophat-1.3.3/bin/tophat -r 100 -o tophat_Hv955 /data/process/DBs/bowtieDBs/ThirdHybridAssemblyCLC /data/rawData/rawDataRoya/RNASeqBGI/Hv955_L1_1.fq /data/rawData/rawDataRoya/RNASeqBGI/Hv955_L1_2.fq

/opt/tophat-1.3.3/bin/tophat -r 100 -o tophat_HvCatNor /data/process/DBs/bowtieDBs/ThirdHybridAssemblyCLC /data/rawData/rawDataRoya/RNASeqBGI/HvCatNor7_L1_1.fq /data/rawData/rawDataRoya/RNASeqBGI/HvCatNor7_L1_2.fq

HvCatNor Reads mapped to ThirdHybridAssemblyCLC

Screen_shot_2012-02-22_at_8.53.02_AM.png

Screen_shot_2012-02-22_at_8.53.44_AM.png

Mapping statistics

samtools flagstat tophat_Hv701/accepted_hits.bam
38037360 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
38037360 + 0 mapped (100.00%:-nan%)
38037360 + 0 paired in sequencing
19200863 + 0 read1
18836497 + 0 read2
33934540 + 0 properly paired (89.21%:-nan%)
34515086 + 0 with itself and mate mapped
3522274 + 0 singletons (9.26%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

To run cufflinks is necesary convert to .sam

samtools view -h -o tophat_Hv701/accepted_hits.bam.sam tophat_Hv701/accepted_hits.bam

Running cufflinks

/opt/cufflinks-1.3.0/bin/cufflinks -o cufflinks_Hv701 tophat_Hv701/accepted_hits.bam.sam
/opt/cufflinks-1.3.0/bin/cufflinks -o cufflinks_Hv955 tophat_Hv955/accepted_hits.bam.sam
/opt/cufflinks-1.3.0/bin/cufflinks -o cufflinks_HvCatNor tophat_HvCatNor/accepted_hits.bam.sam

Merged the transcripts

ls cufflinks_Hv701/transcripts.gtf cufflinks_Hv955/transcripts.gtf cufflinks_HvCatNor/transcripts.gtf > assemblies.txt
cuffmerge -s /data/process/DBs/cenicafe/rust/genome/ThirdHybridAssemblyCLC.fasta assemblies.txt

The final, merged annotation will be in the file merged_asm/merged.gtf, Now can open merged.gtf with visualization software for example IGV.

Counting contigs with reads mapped : awk ‘{print $1}’ merged_asm/merged.gtf | sort | uniq | wc -l
14494

Screen_shot_2012-02-20_at_3.37.44_PM.png

Differential analysis with gene and transcript discovery

cuffdiff -L Hv701,Hv955,HvCatNor -p 4 -o cuffdiff_Hv701_Hv955_HvCatNor merged_asm/merged.gtf /data/process/Roya/cufflinks/tophat_Hv701/accepted_hits.bam.sam /data/process/Roya/cufflinks/tophat_Hv955/accepted_hits.bam.sam /data/process/Roya/cufflinks/tophat_HvCatNor/accepted_hits.bam.sam

R Statistics

Open R and install the package cummeRbund: > biocLite(“cummeRbund”)

setwd(”~/tmp/RNASeqRoya/cuffdiff_Hv701_Hv955_HvCatNor”)
library(cummeRbund)
cuff<-readCufflinks()
cuff

CuffSet instance with:
  3 samples
  21345 genes
  25297 isoforms
  22469 TSS
  0 CDS
  64035 promoters
  67407 splicing
  0 relCDS

Distributions of FPKM scores across samples

FPKM: Fragments Per Kilobase of exon per Million fragments mapped

dens <- csDensity(genes(cuff))
dens

FPKM.png

> b <- csBoxplot(genes(cuff))
> b

boxplot.png

gene.features <- features(genes(cuff))
> head(gene.features)

     gene_id class_code nearest_ref_id gene_short_name                    locus length coverage gene_id
1 XLOC_000001       <NA>           <NA>            <NA>       contig_1:1677-2020     NA       NA    <NA>
2 XLOC_000002       <NA>           <NA>            <NA>      contig_100003:0-214     NA       NA    <NA>
3 XLOC_000003       <NA>           <NA>            <NA>  contig_100006:2319-5847     NA       NA    <NA>
4 XLOC_000004       <NA>           <NA>            <NA>  contig_100006:5992-7648     NA       NA    <NA>
5 XLOC_000005       <NA>           <NA>            <NA> contig_100006:7844-14750     NA       NA    <NA>
6 XLOC_000006       <NA>           <NA>            <NA>  contig_100006:1502-2070     NA       NA    <NA>

Tracks the summed FPKM of transcripts sharing each gene_id

gene.fpkm <- fpkm(genes(cuff))
> head(gene.fpkm)

     gene_id sample_name       fpkm    conf_hi conf_lo quant_status
1 XLOC_000001       Hv701   0.706883    2.82753       0           OK
2 XLOC_000001       Hv955   1.301980    3.98609       0           OK
3 XLOC_000001    HvCatNor   6.155110   14.27330       0           OK
4 XLOC_000002       Hv701 156.857000  346.37600       0           OK
5 XLOC_000002       Hv955 167.752000  370.96100       0           OK
6 XLOC_000002    HvCatNor 944.138000 1922.78000       0           OK

Generate a heatmap of FPKM expression values of 20 genes like example

> data(sampleData)
> myGeneIds <- sampleIDs
> myGeneIds [1] “XLOC_001363” “XLOC_001297” “XLOC_001339” “XLOC_000132” “XLOC_001265” “XLOC_000151” “XLOC_001359” “XLOC_000069” [9] “XLOC_000170” “XLOC_000105” “XLOC_001262” “XLOC_001348” “XLOC_001411” “XLOC_001369” “XLOC_000158” “XLOC_001370”
[17] “XLOC_001263” “XLOC_000115” “XLOC_000089” “XLOC_001240”
>myGenes <- getGenes(cuff, myGeneIds)
>h <- csHeatmap(myGenes, cluster = “both”)
> h

heatmap.png

b <- expressionBarplot(myGenes)
>b

barplot.png

References

1. Cufflinks
2. cummeRbund