I have a direct question about DESeq2's memory handling. I have come to really enjoy DESeq2 for ATAC-Seq and it is really well documented etc. The problem is matrices can get quite large in rows which slows down DESeq a lot. This is why the parallelism of DESeq is quite useful. I was curious if there is/was a push for maybe coming up with a way to deal with the large memory handling when it comes to the parallel extensions. Mainly when running the main DESeq pipeline with parallel = TRUE. When this occurs, the memory spikes and it multiplies the memory usage (n * memory) and this can get quite crazy (i have used 300 GB of RAM to finish a DESeq run). I was wondering if ever there would be the potential of adding capabilities with bigmemory matrices so that the memory is reduced. Also if there was a way for the parallelism to not copy the data for every thread. Thanks.
I tried the following on my cluster here, using either 4 or 8 cores at a time:
library(DESeq2) library(BiocParallel) df <- expand.grid(type=factor(1:25),patient=factor(1:3),rep=1:2) dds <- makeExampleDESeqDataSet(n=100000,m=150) colData(dds) <- DataFrame(df) # ~~~ ...then either this simple design ~~~ design(dds) <- ~patient + type dds <- DESeq(dds, parallel=TRUE, BPPARAM=MulticoreParam(4)) # ~~~ ...or this with patient-specific type effects ~~~ dds$group <- factor(paste0(dds$patient, "_", dds$type)) design(dds) <- ~group dds <- DESeq(dds, parallel=TRUE, BPPARAM=MulticoreParam(4))
The max memory for 4 cores was 6 GB and the max for 8 cores was 10 GB. Our cluster uses SLURM and I requested
span[hosts=1] which "indicates that all the processors allocated to this job must be on the same host". By the way, the reason why the memory gets into the low GBs is because there are many coefficients and statistics being stored within 'dds', beyond the counts and normalization factors, and I think it's hard to restrict what R objects are being sent around to child processes. You could be more explicit in the scripts about breaking apart the dataset into discrete chunks, for example, only the dispersion fit and prior variance steps need to look across all genes (these are fast), and the other steps could occur in totally independent jobs. I think to be certain that objects wouldn't be shared, you'd have to do the splitting manually. I can help if you want to do this route. But my numbers below indicate that you might be better off just lowering the number of cores and maybe filtering the rows a bit if possible.
My timing (I just tried once for each) was 1400 s for simple design / 4 cores, 1100 s for simple design / 8 cores, 2800 s for patient-specific design / 4 cores, 1200 s for patient-specific design / 8 cores. So I would expect ~4x these numbers for memory and timing when you go from 100k to 400k rows.
Do you really have 400k rows with minimal signal across a minimum number of samples? Some modest filtering (e.g. >= 4 samples with normalized counts >= 10) might help brings things to a more manageable state, where it's just taking 20 min to run and using only a few GB.
After saving 'dds' at the completion of the code above, you'll still have to call results() to extract comparisons, and again you'll want to use parallel=TRUE. I would recommend using the above code with version >= 1.16, and then use lfcShrink() to get shrunken LFC for a given 'contrast':
res <- results(dds, contrast=c("group","...","..."), parallel=TRUE) res <- lfcShrink(dds, contrast=c("group","...","..."), res=res)
In the upcoming release of DESeq2 (v1.18 which will come out at the end of the month), you could replace the two lines above with the following faster line of code, which would build a results table with shrunken LFC for the contrast of interest:
res <- lfcShrink(dds, contrast=c("group","...","..."), parallel=TRUE)