Question

large memory footprint with summarizeOverlaps method for BamViews

0

Entering edit mode

alex.gos90 ▴ 10

@alexgos90-13597

Last seen 23 months ago

Germany

Hello,

I would like to point out, that when I use a BamViews object that was defined with a specific bamRanges with the summarizeOverlaps method, the whole Bam file is loaded into Memory, if I do not explicitly provide the param argument.

Here is an example

library(GenomicAlignments)
tiny_bam <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE)
fl <- c(tiny_bam,tiny_bam)
rngs <- GRanges(c("seq1", "seq2"), IRanges(1, c(15, 15)))
samp <- DataFrame(info=c("ex1","ex2"), row.names=c("ex1","ex2"))

# define the BamViews for multiple files using Rsamtools
view <- BamViews(bamPaths = fl, bamSamples=samp, bamRanges=rngs)

So these function calls will have different memory footprints because in one case we are loading the whole BAM file,

se <- summarizeOverlaps(view, mode=Union, ignore.strand=TRUE)

while in the other we only load the reads that are in the given ranges.

se <- summarizeOverlaps(view, 
                        mode=Union, 
                        ignore.strand=TRUE,
                        param=ScanBamParam(which = rngs))

I saw in the source code of the readGAlignments method for BamViews (https://github.com/Bioconductor/GenomicAlignments/blob/master/R/readGAlignments.R#L138-L159) that one could actually internally update the scanBamParam() by using the bamRanges() of the BamViews object, which would remove the necessity of providing the ranges a second time with param argument.

I think this would improve usability of the function and just wanted to let the developers of the very good GenomicAlignments package know.

Best,

Alex

genomicalignments rsamtools BamViews summarizeoverlaps • 1.2k views

ADD COMMENT • link updated 6.1 years ago by Martin Morgan 25k • written 6.1 years ago by alex.gos90 ▴ 10