I am trying to run DESeq2 with many samples (single-cell experiment). However, it seems to be choking on the amount of data. When I run DESeq() with parallel=TRUE, I quickly get a memory error (this does not happen for me with smaller experiments). If I run it with parallel=F, I think it just stalls, since it does not finish for over a day. Is there any way around that? Are there some other parameters I can adjust?
My typical recommendation when users have 100s of biological replicates within each condition is to use limma + voom, which runs very quickly. In our testing, the differences between our method and linear methods like limma or non-parametric methods like SAMseq disappeared as the number of biological replicates within each condition grew large (as you would expect).
In regards to the memory problem: I did learn in an earlier support thread that, if you run BiocParallel (parallel=TRUE), it can benefit to first clear the R environment of any large data objects which are not needed. R's internal memory garbage collector, gc(), may be making copies of the large objects in the environment in all of the workers.