DESeq2 for big data sets
1
0
Entering edit mode
ajajoo • 0
@ajajoo-21737
Last seen 2.4 years ago
United States

I used DESeq2 for around 2500 subjects. I used parallel true option and it used around 40 processors. It took about 1.5 weeks to perform estimating dispersions, fitting model and testing, etc all the way till final dispersion estimates. It also displayed replacing outliers for ** genes etc. But after that the program is just running with one R (vs for other steps it would open 40 R program ) for another week or so. Any idea why it took so long at that stage and what is happening, at this point R was occupying 27GB of ram. Unfortunately, there was power shutdown so the code stopped. So before running it again I would like to know what I can do to may be make it run faster at that stage.

deseq2 • 966 views
ADD COMMENT
3
Entering edit mode
@mikelove
Last seen 15 hours ago
United States

I've mentioned before on the support site, I myself use limma-voom for 100s of samples. The NB GLM is an expensive operation and requires convergence per row.

But another factor, maybe less relevant for you than for others in the 100s of samples regime, is that Constantin Ahlmann-Eltze has improved the efficiency of DESeq2 on large sample sizes by 10 fold in the development branch (you can already access it on GitHub), which will be released in October 2019.

ADD COMMENT
0
Entering edit mode

thank you for replying.

ADD REPLY

Login before adding your answer.

Traffic: 747 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6