Question: DESeq2 for big data sets
0
gravatar for ajajoo
11 weeks ago by
ajajoo0
ajajoo0 wrote:

I used DESeq2 for around 2500 subjects. I used parallel true option and it used around 40 processors. It took about 1.5 weeks to perform estimating dispersions, fitting model and testing, etc all the way till final dispersion estimates. It also displayed replacing outliers for ** genes etc. But after that the program is just running with one R (vs for other steps it would open 40 R program ) for another week or so. Any idea why it took so long at that stage and what is happening, at this point R was occupying 27GB of ram. Unfortunately, there was power shutdown so the code stopped. So before running it again I would like to know what I can do to may be make it run faster at that stage.

deseq2 • 65 views
ADD COMMENTlink modified 11 weeks ago by Michael Love26k • written 11 weeks ago by ajajoo0
Answer: DESeq2 for big data sets
1
gravatar for Michael Love
11 weeks ago by
Michael Love26k
United States
Michael Love26k wrote:

I've mentioned before on the support site, I myself use limma-voom for 100s of samples. The NB GLM is an expensive operation and requires convergence per row.

But another factor, maybe less relevant for you than for others in the 100s of samples regime, is that Constantin Ahlmann-Eltze has improved the efficiency of DESeq2 on large sample sizes by 10 fold in the development branch (you can already access it on GitHub), which will be released in October 2019.

ADD COMMENTlink written 11 weeks ago by Michael Love26k

thank you for replying.

ADD REPLYlink written 11 weeks ago by ajajoo0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 324 users visited in the last hour