Question: What populates makeTranscriptDbFromBiomart?
6.9 years ago by
Ravi Karra • 140
Ravi Karra • 140 wrote:
Hi, Just starting to learn how to look at RNA Seq data, so apologies in advance. I ran my RNA-Seq experiment on a GAII and aligned to the zebrafish genome using Bowtie2/Tophat2. I downloaded the current zebrafish genome (Zv9) and transcript gtf file from Ensembl for the reference indices. I am trying to use edgeR to look at differential expression, but am a little hung up on getting the count data. As you can see from the code below, I input 8835090 mapped reads, but only 5380643 are overlapped with known transcripts. It seems that I am losing reads in summarizing the count data and I can't really figure out why. Is the transcript information that results from makeTranscriptDbFromBiomart identical to the transcript information in the gtf files that can be downloaded via Ensembl? Thanks again, Ravi zv9txs = makeTranscriptDbFromBiomart(biomart="ensembl",dataset="drerio _gene_ensembl") tx_by_gene=transcriptsBy(zv9txs,'gene') #read in the mapped bam hits file reads=readBamGappedAlignments("accepted_hits.bam") #check compatibility of names txs = names(seqlengths(tx_by_gene)) r = as.character(unique(rname(reads))) table (r %in% txs) #all reads mapped reads have the same chr names TRUE 752 #make a genomic Ranges object and remove strand ambiguity reads=GRanges(seqnames=rname(reads),ranges=IRanges(start=start(reads), end=end(reads)), strand=rep("*",length(reads))) #summarize counts data counts=countOverlaps(tx_by_gene,reads) sum (counts)  5380643 length (reads)  8835090 > sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale:  en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages:  stats graphics grDevices utils datasets methods base other attached packages:  BiocInstaller_1.2.1 Rsamtools_1.6.3 Biostrings_2.22.0 biomaRt_2.10.0  GenomicFeatures_1.6.9 AnnotationDbi_1.16.19 Biobase_2.14.0 GenomicRanges_1.6.7  IRanges_1.12.6 loaded via a namespace (and not attached):  bitops_1.0-4.1 BSgenome_1.22.0 DBI_0.2-5 grid_2.14.1 hwriter_1.3  lattice_0.20-6 RCurl_1.91-1 RSQLite_0.11.1 rtracklayer_1.14.4 ShortRead_1.12.4  tools_2.14.1 XM
ADD COMMENT • link •