DESeq2 for big data sets
Entering edit mode
ajajoo • 0
Last seen 4 weeks ago
United States

I used DESeq2 for around 2500 subjects. I used parallel true option and it used around 40 processors. It took about 1.5 weeks to perform estimating dispersions, fitting model and testing, etc all the way till final dispersion estimates. It also displayed replacing outliers for ** genes etc. But after that the program is just running with one R (vs for other steps it would open 40 R program ) for another week or so. Any idea why it took so long at that stage and what is happening, at this point R was occupying 27GB of ram. Unfortunately, there was power shutdown so the code stopped. So before running it again I would like to know what I can do to may be make it run faster at that stage.

deseq2 • 278 views
Entering edit mode
Last seen 1 day ago
United States

I've mentioned before on the support site, I myself use limma-voom for 100s of samples. The NB GLM is an expensive operation and requires convergence per row.

But another factor, maybe less relevant for you than for others in the 100s of samples regime, is that Constantin Ahlmann-Eltze has improved the efficiency of DESeq2 on large sample sizes by 10 fold in the development branch (you can already access it on GitHub), which will be released in October 2019.

Entering edit mode

thank you for replying.


Login before adding your answer.

Traffic: 354 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6