Question

[DiffBind] Memory issues with dba.count()

0

Entering edit mode

enricoferrero ▴ 660

@enricoferrero-6037

Last seen 2.4 years ago

Switzerland

Hi Rory et al.,

I'm hitting the memory limits of my server (96GB RAM) when using DiffBind::dba.count(), which results in my job getting killed.

I'm trying to generate a count matrix from many samples (>30), which translates to many sites/peaks. I suspect the massive matrix cannot be allocated by R into memory.

I've seen the argument bLowMem mentioned in some previous discussions, but it doesn't seem to be recognised by dba.count() any longer, is that right?

Is there any way to use dba.count() in this scenario? Would something like the bigmemory package be helpful here?

Thank you,

diffbind bigmemory matrix dba.count • 2.3k views

ADD COMMENT • link updated 8.7 years ago by Rory Stark ★ 5.1k • written 8.7 years ago by enricoferrero ▴ 660

score 1 · Answer 1 · 2015-08-19

1

Entering edit mode

Rory Stark ★ 5.1k

@rory-stark-5741

Last seen 8 days ago

Cambridge, UK

Hello-

The bLowMem parameter was replaced by bUseSummarizeOverlaps. You can try setting this to TRUE. when calling dba.count(). You can also set the configuration value $config$yieldSize in your DBA object to a lower value (like 50000).

Another approach is to use a consensus peakset with fewer peaks. If you are relying on the minOverlap parameter (default value 2), you can set it higher. Calling dba.overlap() with mode=DBA_OLAP_RATE will return a vector with the number of consensus peaks for successively greater values of minOverlap so you can choose an appropriate one.

I am currently looking at memory usage in DiffBind, as it does seem to occasionally ballon very high, and hope to have a fix in the next version.

Regards-

Rory

ADD COMMENT • link 8.7 years ago Rory Stark ★ 5.1k

0

Entering edit mode

Thanks Rory,

I'll try using the summarizeOverlaps option with a lower yieldSize.

It's great to hear that you're looking at the memory consumption - it's the one thing that is keeping me from using DiffBind more extensively across projects.

Best,

ADD REPLY • link 8.7 years ago enricoferrero ▴ 660

0

Entering edit mode

FYI, in the development version of DiffBind (1.17.6 and later), we have made significant improvements in peak memory usage, reducing it by an order of magnitude, especially in the case where a binding matrix is being constructed (e.g. dba.count). I have an analysis that was taking >70GB to run and now takes 5GB. Give it a try!

Cheers-

Rory

ADD REPLY • link 8.4 years ago Rory Stark ★ 5.1k