Question: genomicRanges/Features reads counted in features and inbetween features is not the total number of counted reads
gravatar for Sam McInturf
3.8 years ago by
Sam McInturf300
United States
Sam McInturf300 wrote:
Hello everyone, I am having trouble counting RNA seq reads with GenomicRanges and GenomicFeatures. Essentially the problem is I have many reads (10 million mapped) and I only can count 200K in features and 300K between features. So obviously I am doing something wrong. I am working with Arabidopsis thaliana, so I mapped to the TAIR10.17 release with tophat2. The gff file I use is TAIR10.17.gtf, I also tried using TAIR10.16.gtf. I also made a GRangesList with the 10.17 with no effect. This is what I have done while using 1 library #outside of R stuff tophat2 (no -G ) blah blah samtools sort accepted_hits.bam foo samtools index foo #In R #load bams bf <- "myBam.bam" bi <- "myBam.bai" bfl <- BamFileList(bf, bi) #set up TxDb gff <- makeTranscriptDbFromGFF(gff) tx <- transcriptsBy(gff) #summarize olap <- summarizeOverlaps(tx, bfl) #how many reads mapped? colSums(assays(olap)$counts) #only 222,835! are the reads inbetween features? abf <- BamFile("path to a single bam file") reads <- readGappedAlignments(bf) tx2 <- transcripts(gff) strand(tx2) <- "*" txs <- reduce(tx2) nontx <- gaps(txs) o <- summarizeOverlaps(nontx, reads, ignore.strand = TRUE) sum(o) #365401 So what happened? If the reads are not in the features, and not inbetween the features, where did they go? Anything helps, and thanks in advance Sam > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 [4] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6 [7] BSgenome_1.30.0 DBI_0.2-7 GenomicFeatures_1.14.2 [10] GenomicRanges_1.14.4 IRanges_1.20.6 parallel_3.0.2 [13] RCurl_1.95-4.1 Rsamtools_1.14.2 RSQLite_0.11.4 [16] rtracklayer_1.22.0 stats4_3.0.2 tools_3.0.2 [19] XML_3.98-1.1 XVector_0.2.0 zlibbioc_1.8.0 -- Sam McInturf [[alternative HTML version deleted]]
ADD COMMENTlink written 3.8 years ago by Sam McInturf300
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 120 users visited in the last hour