DESeq2 with many samples
1
1
Entering edit mode
igor ▴ 40
@igor
Last seen 19 months ago
United States

I am trying to run DESeq2 with many samples (single-cell experiment). However, it seems to be choking on the amount of data. When I run DESeq() with parallel=TRUE, I quickly get a memory error (this does not happen for me with smaller experiments). If I run it with parallel=F, I think it just stalls, since it does not finish for over a day. Is there any way around that? Are there some other parameters I can adjust?

deseq2 • 5.0k views
3
Entering edit mode
@mikelove
Last seen 13 hours ago
United States

My typical recommendation when users have 100s of biological replicates within each condition is to use limma + voom, which runs very quickly. In our testing, the differences between our method and linear methods like limma or non-parametric methods like SAMseq disappeared as the number of biological replicates within each condition grew large (as you would expect).

In regards to the memory problem: I did learn in an earlier support thread that, if you run BiocParallel (parallel=TRUE), it can benefit to first clear the R environment of any large data objects which are not needed. R's internal memory garbage collector, gc(), may be making copies of the large objects in the environment in all of the workers.

C: Memory usage for bplapply -- is the entire environment copied over to each worke

0
Entering edit mode

I will look into limma + voom. Is there a way to do the normalization with limma + voom and then bring the results back into DESeq2? I still like the rest of the DESeq functions and how they behave.

0
Entering edit mode

No, you can't split an analysis to be done by a mix of voom and DESeq2. Which functions from DESeq2 are you referring to when you say "I still like the rest of the DESeq functions and how they behave"?

We could show you what the equivalent moves would be on the voom/limma side given the DESeq2 mojo you want to replicate.

0
Entering edit mode

I have not used voom, so I am not sure how many issues I will run into.

One example would be varianceStabilizingTransformation, which I believe is DESeq2-specific.

0
Entering edit mode

Well you could try my suggestion regarding cleaning up the environment before running DESeq().

Note that single cell RNA-seq is expected to have peculiar distributional properties unlike the bulk RNA-seq experiments which DESeq2 was designed for. I'm not an expert on which packages are best for single cell analysis. At the least, you should be looking at the data a lot by eye, e.g. with plotCounts().

"Is there a way to do the normalization with limma + voom and then bring the results back into DESeq2"

Not really. The data objects are structured differently. Some functions, like plotMA() in DESeq2, are just directly calling a function plotMA() defined in another Bioconductor package, geneplotter. This function takes a data.frame as input, see the help page for ?plotMA in geneplotter. Other of our plotting functions are quite simple and you can tweak the code yourself if you are familiar with R.

https://github.com/Bioconductor-mirror/DESeq2/blob/master/R/plots.R

0
Entering edit mode

I am also using a single-cell specific package, but it requires normalized values as input, which is why I still need to use a more traditional package first.