large memory footprint with summarizeOverlaps method for BamViews
0
0
Entering edit mode
alex.gos90 ▴ 10
@alexgos90-13597
Last seen 12 weeks ago
Germany

Hello,

I would like to point out, that when I use a BamViews object that was defined with a specific bamRanges with the summarizeOverlaps method, the whole Bam file is loaded into Memory, if I do not explicitly provide the param argument.

Here is an example

library(GenomicAlignments)
tiny_bam <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE)
fl <- c(tiny_bam,tiny_bam)
rngs <- GRanges(c("seq1", "seq2"), IRanges(1, c(15, 15)))
samp <- DataFrame(info=c("ex1","ex2"), row.names=c("ex1","ex2"))

# define the BamViews for multiple files using Rsamtools
view <- BamViews(bamPaths = fl, bamSamples=samp, bamRanges=rngs)

So these function calls will have different memory footprints because in one case we are loading the whole BAM file,

se <- summarizeOverlaps(view, mode=Union, ignore.strand=TRUE)

while in the other we only load the reads that are in the given ranges.

se <- summarizeOverlaps(view, 
                        mode=Union, 
                        ignore.strand=TRUE,
                        param=ScanBamParam(which = rngs))

I saw in the source code of the readGAlignments method for BamViews (https://github.com/Bioconductor/GenomicAlignments/blob/master/R/readGAlignments.R#L138-L159) that one could actually internally update the scanBamParam() by using the bamRanges() of the BamViews object, which would remove the necessity of providing the ranges a second time with param argument.

I think this would improve usability of the function and just wanted to let the developers of the very good GenomicAlignments package know.

Best,

Alex

genomicalignments rsamtools BamViews summarizeoverlaps • 672 views
ADD COMMENT

Login before adding your answer.

Traffic: 480 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6