hi Jeff,
I tried the following on my cluster here, using either 4 or 8 cores at a time:
library(DESeq2)
library(BiocParallel)
df <- expand.grid(type=factor(1:25),patient=factor(1:3),rep=1:2)
dds <- makeExampleDESeqDataSet(n=100000,m=150)
colData(dds) <- DataFrame(df)
# ~~~ ...then either this simple design ~~~
design(dds) <- ~patient + type
dds <- DESeq(dds, parallel=TRUE, BPPARAM=MulticoreParam(4))
# ~~~ ...or this with patient-specific type effects ~~~
dds$group <- factor(paste0(dds$patient, "_", dds$type))
design(dds) <- ~group
dds <- DESeq(dds, parallel=TRUE, BPPARAM=MulticoreParam(4))
The max memory for 4 cores was 6 GB and the max for 8 cores was 10 GB. Our cluster uses SLURM and I requested span[hosts=1]
which "indicates that all the processors allocated to this job must be on the same host". By the way, the reason why the memory gets into the low GBs is because there are many coefficients and statistics being stored within 'dds', beyond the counts and normalization factors, and I think it's hard to restrict what R objects are being sent around to child processes. You could be more explicit in the scripts about breaking apart the dataset into discrete chunks, for example, only the dispersion fit and prior variance steps need to look across all genes (these are fast), and the other steps could occur in totally independent jobs. I think to be certain that objects wouldn't be shared, you'd have to do the splitting manually. I can help if you want to do this route. But my numbers below indicate that you might be better off just lowering the number of cores and maybe filtering the rows a bit if possible.
My timing (I just tried once for each) was 1400 s for simple design / 4 cores, 1100 s for simple design / 8 cores, 2800 s for patient-specific design / 4 cores, 1200 s for patient-specific design / 8 cores. So I would expect ~4x these numbers for memory and timing when you go from 100k to 400k rows.
Do you really have 400k rows with minimal signal across a minimum number of samples? Some modest filtering (e.g. >= 4 samples with normalized counts >= 10) might help brings things to a more manageable state, where it's just taking 20 min to run and using only a few GB.
After saving 'dds' at the completion of the code above, you'll still have to call results() to extract comparisons, and again you'll want to use parallel=TRUE. I would recommend using the above code with version >= 1.16, and then use lfcShrink() to get shrunken LFC for a given 'contrast':
res <- results(dds, contrast=c("group","...","..."), parallel=TRUE)
res <- lfcShrink(dds, contrast=c("group","...","..."), res=res)
In the upcoming release of DESeq2 (v1.18 which will come out at the end of the month), you could replace the two lines above with the following faster line of code, which would build a results table with shrunken LFC for the contrast of interest:
res <- lfcShrink(dds, contrast=c("group","...","..."), parallel=TRUE)
What platform is your operating system?
The operating system is Ubuntu 14.04.1 LTS.
Can you give more details, eg the object.size() of ‘dds’ and how you are using BiocParallel?
Hi,
I guess I mistook maybe that the size of the object being big itself but the object.size of dds is 24554008 bytes, the way i run BiocParallel is (threads i normally set to 16-24)
library(BiocParallel)
register(MulticoreParam(threads)) #threads normally 16-24
print("Garbage Collecting")
gc()
dds <- DESeq(dds ,parallel =TRUE, modelMatrixType = "expanded", betaPrior = TRUE)
Which the dimension of the counts matrix is around 400K rows and ~100 samples. I supply custom normalization factors as well.
Thanks!
How many threads?
Normally 16-24
For debugging, can you try a subset, eg 50k rows, with say 4 cores, and see how long that takes and how much memory?
I have been doing approximately that right as we speak. I have been using 8 cores ~60k rows (thresholding) and it took around ~8 hours when 12 cores ~400k rows was about ~24 hours (with ~300 GB of RAM usage). The memory usage scales with the size of the counts 8 cores 60k rows uses 3-4GB per thread while when i ran 12 cores ~400k rows was about 30GB per thread. This made me believe that the matrix memory is the core issue (maybe its getting copied for each thread so the RAM spikes). The speed seems to scale appropriately. Let me know if you want any more tests.
A further question i have is there anyway to maybe allow an option to save progress during DESeq main part? I have had the memory max for bpapply towards the very end which loses a lot of the work. Or an easy way to execute line by line with parallel options maybe a vignette?
Thanks!
Can you explain a bit about the design? This seems like way too long even for a single core, unless you have some very wide design matrix with hundreds of coefficients to fit. I think there’s some issue. I’ll try to replicate something like a problem of your dimension and report what I get on my machines.
Sure. So the datasets that cause the main problems definitely have lots of coefficients (50+). For this design in particular we have lots of cell types from mulitple donors and the questions we are interested in is the difference between the cell types as a whole and across individuals.
For each individual we have replicates and this leads to a simple design where it is (sorry i am not sure how to properly put this in) sample (Patient_Type_Rep) and a contrast column (Patient_Type) and the design is ~contrast in this case.
Edit. (limited user so i cannot post below for a few hours) In this case there is about ~25 Types with ~3 Patients (~75 Patient_Type) with 2+ replicates so in total ~150 samples.
Can you give me a ballpark on number of patient and type? I’ll try it with mock data on my end and report timings etc.
In this case there is about ~25 Types with ~3 Patients (~75 Patient_Type) with 2+ replicates so in total ~150 samples.