Question: [DiffBind] Memory issues with dba.count()
0
4.2 years ago by
enricoferrero570
Switzerland
enricoferrero570 wrote:

Hi Rory et al.,

I'm hitting the memory limits of my server (96GB RAM) when using DiffBind::dba.count(), which results in my job getting killed.

I'm trying to generate a count matrix from many samples (>30), which translates to many sites/peaks. I suspect the massive matrix cannot be allocated by R into memory.

I've seen the argument bLowMem mentioned in some previous discussions, but it doesn't seem to be recognised by dba.count() any longer, is that right?

Is there any way to use dba.count() in this scenario? Would something like the bigmemory package be helpful here?

Thank you,

modified 4.2 years ago by Rory Stark2.8k • written 4.2 years ago by enricoferrero570
Answer: [DiffBind] Memory issues with dba.count()
1
4.2 years ago by
Rory Stark2.8k
CRUK, Cambridge, UK
Rory Stark2.8k wrote:

Hello-

The bLowMem parameter was replaced by bUseSummarizeOverlaps. You can try setting this to TRUE. when calling dba.count(). You can also set the configuration value $config$yieldSize in your DBA object to a lower value (like 50000).

Another approach is to use a consensus peakset with fewer peaks. If you are relying on the minOverlap parameter (default value 2), you can set it higher. Calling dba.overlap() with mode=DBA_OLAP_RATE will return a vector with the number of consensus peaks for successively greater values of minOverlap so you can choose an appropriate one.

I am currently looking at memory usage in DiffBind, as it does seem to occasionally ballon very high, and hope to have a fix in the next version.

Regards-

Rory

Thanks Rory,

I'll try using the summarizeOverlaps option with a lower yieldSize.

It's great to hear that you're looking at the memory consumption - it's the one thing that is keeping me from using DiffBind more extensively across projects.

Best,

FYI, in the development version of DiffBind (1.17.6 and later), we have made significant improvements in peak memory usage, reducing it by an order of magnitude, especially in the case where a binding matrix is being constructed (e.g. dba.count). I have an analysis that was taking >70GB to run and now takes 5GB. Give it a try!

Cheers-

Rory