genomicRanges/Features reads counted in features and inbetween features is not the total number of counted reads
0
0
Entering edit mode
Sam McInturf ▴ 300
@sam-mcinturf-5291
Last seen 9.2 years ago
United States
Hello everyone, I am having trouble counting RNA seq reads with GenomicRanges and GenomicFeatures. Essentially the problem is I have many reads (10 million mapped) and I only can count 200K in features and 300K between features. So obviously I am doing something wrong. I am working with Arabidopsis thaliana, so I mapped to the TAIR10.17 release with tophat2. The gff file I use is TAIR10.17.gtf, I also tried using TAIR10.16.gtf. I also made a GRangesList with the 10.17 with no effect. This is what I have done while using 1 library #outside of R stuff tophat2 (no -G ) blah blah samtools sort accepted_hits.bam foo samtools index foo #In R #load bams bf <- "myBam.bam" bi <- "myBam.bai" bfl <- BamFileList(bf, bi) #set up TxDb gff <- makeTranscriptDbFromGFF(gff) tx <- transcriptsBy(gff) #summarize olap <- summarizeOverlaps(tx, bfl) #how many reads mapped? colSums(assays(olap)$counts) #only 222,835! are the reads inbetween features? abf <- BamFile("path to a single bam file") reads <- readGappedAlignments(bf) tx2 <- transcripts(gff) strand(tx2) <- "*" txs <- reduce(tx2) nontx <- gaps(txs) o <- summarizeOverlaps(nontx, reads, ignore.strand = TRUE) sum(o) #365401 So what happened? If the reads are not in the features, and not inbetween the features, where did they go? Anything helps, and thanks in advance Sam > sessionInfo() R version 3.0.2 (2013-09-25) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0 [4] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6 [7] BSgenome_1.30.0 DBI_0.2-7 GenomicFeatures_1.14.2 [10] GenomicRanges_1.14.4 IRanges_1.20.6 parallel_3.0.2 [13] RCurl_1.95-4.1 Rsamtools_1.14.2 RSQLite_0.11.4 [16] rtracklayer_1.22.0 stats4_3.0.2 tools_3.0.2 [19] XML_3.98-1.1 XVector_0.2.0 zlibbioc_1.8.0 -- Sam McInturf [[alternative HTML version deleted]]
GenomicRanges GenomicRanges • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 626 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6