Question: Memory issues in summarizeOverlaps funtion
0
gravatar for Diana
8 months ago by
Diana10
Diana10 wrote:

Hi all,

I get a memory error ('error: cannot allocate vector of size 344.5 Mb') when running summarizeOverlaps in the Genomic alignments package. I have 4 GB RAM (with about 3.8 GB free space) and I use 64 bits R. I also increased the memory.limit size to 3500 and I tried -- vanilla as well. Nothing seems to work. Do you have any ideas? Thanks a lot!

summarizeoverlaps memory • 246 views
ADD COMMENTlink modified 8 months ago by James W. MacDonald51k • written 8 months ago by Diana10
Answer: Memory issues in summarizeOverlaps funtion
1
gravatar for James W. MacDonald
8 months ago by
United States
James W. MacDonald51k wrote:

Assuming you are reading in data from BAM files, you should try reading the data in chunks. See ?BamFile, particularly the yieldSize argument, and the examples which show how it's used.

ADD COMMENTlink written 8 months ago by James W. MacDonald51k

Hi James,

Thanks for your answer! Yes, I am reading BAM files. I know the yieldSize argument, but the file itself is about 500 MB, so isn't the memory error a bit strange? What could be an explanation besides low RAM memory (which is not the case)?

ADD REPLYlink written 8 months ago by Diana10

You say you are reading BAM files, but then you say 'the file itself', so it's not clear if you are reading in one or more files. Anyway, having a computer with 4 Gb RAM doesn't mean you actually have that much RAM to allocate to R. It may be much less, depending on what else you have running. And reading in a 500 Mb file will probably take more RAM than you would expect, given underlying copies that may be created. And if you are on Windows, which sometimes has problems releasing memory, that might be exacerbated.

I wouldn't use a Windows box with 4 Gb RAM for really basic stuff (16 Gb RAM is about the lowest I would go, even for casual use), so it's not surprising to me at all that you would run out of RAM trying to do something real.

You say that you 'know the yieldSize argument'. Does that mean you are using it, or just that you know it exists?

ADD REPLYlink written 8 months ago by James W. MacDonald51k

Sorry, currently I am reading in one file. As for the yieldSize argument, I know it exists. I haven't yet tried it, as I assumed it would take a looong time to read the whole file in seperate chunks. I will try it with a yieldSize of 2000000 to start with.

ADD REPLYlink modified 8 months ago • written 8 months ago by Diana10
> bfl <- BamFileList("../../data/star_aligned/303360Aligned.sortedByCoord.out.bam")
> system.time(summarizeOverlaps(ensex, bfl))
   user  system elapsed 
233.280  34.764 268.371 

> bfl <- BamFileList("../../data/star_aligned/303360Aligned.sortedByCoord.out.bam", yieldSize = 2e5)
> system.time(summarizeOverlaps(ensex, bfl))
   user  system elapsed 
222.436   3.960 226.655 
ADD REPLYlink written 8 months ago by James W. MacDonald51k

Hi, I have still one question about the reduceByYield argument. I have the following code:

> csvfile <- file.path("W29-1-1.csv")
> sampleTable <- read.csv(csvfile,row.names=1)
> sampleTable
       File
1 W29-1-1-B
2 W29-1-1-F
> setwd("C:/Program Files/BAM files")
> filename <- file.path(paste0(sampleTable$File, "_aligned_genome_anonymized.sorted29.bam"))
> file.exists(filename)
[1] TRUE TRUE
> library("Rsamtools")
> library(GenomicFiles)
> library(GenomicFeatures)
> library(GenomicRanges)
> library("GenomicAlignments")
> library("BiocParallel")
> library("Rsamtools")
> bamfiles <- BamFileList(filename, yieldSize=2000000)
 x <- bamfiles
YIELD <- readGAlignments
 reduceByYield(x, YIELD, MAP=identity, REDUCE='+', parallel=FALSE)

However, I get the following error:

> Error in (function (classes, fdef, mtable)  :    unable to find an
> inherited method for function ‘readGAlignments’ for signature
> ‘"BamFileList"’

My following steps are counting reads with summarizeOverlaps and performing a differential expression analysis with edgeR. This works fine with my current Yieldsize of 2000000, but I want to perform these analysis on complete BAM-files. Do you know how I can make this reduceByYield argument work?

ADD REPLYlink written 8 months ago by Diana10

Why are you doing that? Simply passing a BamFileList to summarizeOverlaps where you have specified the yieldSize for the BamFileList will cause the data to be read in chunks.

ADD REPLYlink written 8 months ago by James W. MacDonald51k

Really? So simply running se will actually count all reads? That would be great... But how is it possible that tail(assay(se)) gives 9997 as last row and rowRanges(se) gives an object of length 25892? I am sorry for asking these probably basic questions...

ADD REPLYlink written 8 months ago by Diana10

I think you might be confused. The row names for a SummarizedExperiment are the underlying IDs (which in your case might be Entrez Gene IDs? The yieldSize argument simply sets the chunk size for the data being read in, not the total amount of data to read in:

> bams <- c("303301Aligned.sortedByCoord.out.bam","303362Aligned.sortedByCoord.out.bam")
> bfl <- BamFileList(bams)
> se_all <- summarizeOverlaps(ensex, bfl)
> bfl <- BamFileList(bams, yieldSize = 2e5)
> se_by_yield <- summarizeOverlaps(ensex, bfl)
> se_all
class: RangedSummarizedExperiment 
dim: 225589 2 
metadata(0):
assays(1): counts
rownames(225589): ENSSSCG00000000002 ENSSSCG00000000002 ...
  ENSSSCG00000040989 ENSSSCG00000040989
rowData names(0):
colnames(2): 303301Aligned.sortedByCoord.out.bam
  303362Aligned.sortedByCoord.out.bam
colData names(0):
> se_by_yield
class: RangedSummarizedExperiment 
dim: 225589 2 
metadata(0):
assays(1): counts
rownames(225589): ENSSSCG00000000002 ENSSSCG00000000002 ...
  ENSSSCG00000040989 ENSSSCG00000040989
rowData names(0):
colnames(2): 303301Aligned.sortedByCoord.out.bam
  303362Aligned.sortedByCoord.out.bam
colData names(0):

Please note that the dim for both SummarizedExperiments are identical, and that the rownames are (in this case) Ensembl Gene IDs.

ADD REPLYlink written 8 months ago by James W. MacDonald51k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 396 users visited in the last hour