Entering edit mode
Hello Doron-
Yes, the memory usage when calling dba.count is definitely an issue
one we are planning on addressing in the next version. I'll let you
know what that is available.
I see you are running dba.count with bParallel=FALSE, so you should
only be reading in one file at a time. How large (in Gb, or how many
reads) is your largest bam file? I've never seen dba.count use this
much memory! Let us know the sizes so we can see if it is something
we should be debugging. Please also sent the output of sessionInfo.
Besides changing dba.count to not use so much memory, we are also
implementing an option to read the counts in directly as you have
suggested. I am hoping to check this option in fairly soon (I already
have a version of it running and use it regularly for RNA-seq data).
Regards-
Rory
From: Doron Betel
<dob2014@med.cornell.edu<mailto:dob2014@med.cornell.edu>>
Organization: WCMC
Date: Fri, 1 Feb 2013 18:05:02 -0500
To: Rory Stark
<rory.stark@cancer.org.uk<mailto:rory.stark@cancer.org.uk>>
Subject: Re: [BioC] DiffBind error loading dba.count
Resent-From: Rory Stark
<rory.stark@cancer.org.uk<mailto:rory.stark@cancer.org.uk>>
Hi Rory,
I came across this threads in the mailing list when looking for a
solution to a similar problem.
I have 12 ChiP-seq samples with the associated chip and control bam
files.
When I run the following call:
fivehmc.peaks <- dba.count(fivehmc.peaks, minOverlap=2,
bParallel=FALSE, bCorPlot=FALSE,maxFilter=10)
The R session is killed by the linux OS after consuming a huge amount
of memory (in my last check it was ~40g-50g).
I have a 100G RAM linux server which should be more than enough to
read in this data.
I tired different options and poking a bit at the source code but i
can't find a solution to this.
I can easily generate the count matrix for the peaks myself (for both
chip and control) but i don't know if, and how, it is possible to add
it to the DBA object without calling dba.count and what would be the
data structure it requires. I really like the package and it could
potentially be very useful to me but this large memory consumption is
limiting its use.
Any ideas how i can work around this problem?
Thanks for your help,
doron
--
Doron Betel Ph.D.
Assistant Professor of Computational Biomedicine
Department of Medicine &
Institute for Computational Biomedicine
Weill Cornell Medical College
NOTICE AND DISCLAIMER
This e-mail (including any attachments) is intended for
...{{dropped:20}}