Entering edit mode
Ravi Karra
▴
140
@ravi-karra-4463
Last seen 10.2 years ago
Hi,
Just starting to learn how to look at RNA Seq data, so apologies in
advance. I ran my RNA-Seq experiment on a GAII and aligned to the
zebrafish genome using Bowtie2/Tophat2. I downloaded the current
zebrafish genome (Zv9) and transcript gtf file from Ensembl for the
reference indices. I am trying to use edgeR to look at differential
expression, but am a little hung up on getting the count data.
As you can see from the code below, I input 8835090 mapped reads, but
only 5380643 are overlapped with known transcripts. It seems that I
am losing reads in summarizing the count data and I can't really
figure out why. Is the transcript information that results from
makeTranscriptDbFromBiomart identical to the transcript information in
the gtf files that can be downloaded via Ensembl?
Thanks again,
Ravi
zv9txs = makeTranscriptDbFromBiomart(biomart="ensembl",dataset="drerio
_gene_ensembl")
tx_by_gene=transcriptsBy(zv9txs,'gene')
#read in the mapped bam hits file
reads=readBamGappedAlignments("accepted_hits.bam")
#check compatibility of names
txs = names(seqlengths(tx_by_gene))
r = as.character(unique(rname(reads)))
table (r %in% txs) #all reads mapped reads have the same chr names
TRUE
752
#make a genomic Ranges object and remove strand ambiguity
reads=GRanges(seqnames=rname(reads),ranges=IRanges(start=start(reads),
end=end(reads)), strand=rep("*",length(reads)))
#summarize counts data
counts=countOverlaps(tx_by_gene,reads)
sum (counts)
[1] 5380643
length (reads)
[1] 8835090
> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocInstaller_1.2.1 Rsamtools_1.6.3 Biostrings_2.22.0
biomaRt_2.10.0
[5] GenomicFeatures_1.6.9 AnnotationDbi_1.16.19 Biobase_2.14.0
GenomicRanges_1.6.7
[9] IRanges_1.12.6
loaded via a namespace (and not attached):
[1] bitops_1.0-4.1 BSgenome_1.22.0 DBI_0.2-5
grid_2.14.1 hwriter_1.3
[6] lattice_0.20-6 RCurl_1.91-1 RSQLite_0.11.1
rtracklayer_1.14.4 ShortRead_1.12.4
[11] tools_2.14.1 XM