Question: DESeq2 rlog function takes too long
0
gravatar for bharata1803
3.4 years ago by
bharata180340
Japan
bharata180340 wrote:

Hello,

I have a quite big readcount matrix form TCGA. The size is 577 samples with number of genes 18.522. When I tried to run DESeq2 to calculate log foldchange, it took not that long, around 3-4 hours. After that, I want to use rlog function to get the log transform of gene expression but it almost take 24 hours and it still not finish. I cancel it because I think it is error.I have Intel® Core™ i7 CPU 975 @ 3.33GHz × 8 with RAM 24 GB. I know that R can not use multiple core to calculate DESeq2. Is there any suggestion how to optimize this process?

deseq2 • 4.0k views
ADD COMMENTlink modified 3.4 years ago by Joseph Bundy20 • written 3.4 years ago by bharata180340
Answer: DESeq2 rlog function takes too long
5
gravatar for Michael Love
3.4 years ago by
Michael Love23k
United States
Michael Love23k wrote:

In the vignette and the workflow, I suggest to use the VST instead for hundreds of samples:

Note on running time: if you have many samples (e.g. 100s), the rlog function might take too long, and the variance stabilizing transformation might be a better choice. The rlog and VST have similar properties, but the rlog requires fitting a shrinkage term for each sample and each gene which takes time.

EDIT (Oct 2017): the code snippet below is no longer necessary, as the speedup is implemented in the function vst(), since DESeq2 version 1.12.

In addition to this suggestion, here is a snippet of code to speed up the VST even more.

I keep planning to add this to DESeq2 as a proper function, but haven't done so yet.

 

ADD COMMENTlink modified 20 months ago • written 3.4 years ago by Michael Love23k

Thank you for your code. I will try that. 

ADD REPLYlink written 3.4 years ago by bharata180340
Answer: DESeq2 rlog function takes too long
2
gravatar for Gordon Smyth
3.4 years ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

You probably already know this, but the rpkm() or cpm() functions in the edgeR package compute log transformed gene expression very quickly. These compute a simple but effective regularized log transformation.

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Gordon Smyth37k

Thanks for the suggestion. This is the first time I use data  these much. I will try your suggestion.

ADD REPLYlink written 3.4 years ago by bharata180340

For the purpose of leaving breadcrumbs, a similar function in DESeq2 is normTransform which divides out library size factors, adds a pseudocount and log2 transforms. This was added when plotPCA was added to BiocGenerics, so that DESeq2::plotPCA could be easily run on a matrix log normalized counts, for comparing various transformation options.

ADD REPLYlink written 3.4 years ago by Michael Love23k
Answer: DESeq2 rlog function takes too long
0
gravatar for Joseph Bundy
3.4 years ago by
Joseph Bundy20
United States
Joseph Bundy20 wrote:

Hi there,

I've been encountering similar problems with long wait times on certain R functions (especially those in DEXSeq and WGCNA), and I have only 60 samples. If waiting around on R is a problem you're facing often, I might give Intel MKL libraries a look, discussed here: http://brettklamer.com/diversions/statistical/faster-blas-in-r/  It speeds up certain calculations and allows some calculations in R to use multiple cores.

The easiest way to get the libraries is to simply download Revolution R (which is free, and automatically recognized by R-studio):
https://mran.revolutionanalytics.com/download/#download

I gave it a try at my PI's suggestion, and it's cut down on some of the analysis times considerably. Just make sure you install both Revolution R AND the MKL library. Just to be clear, as I realize I sound a bit like a salesman, I am not an employee of Revoltuion Analytic. I just download and used their library because it was advertised as doing mathematical calculations more efficiently and enables multi-threaded calculations (which I have confirmed by watching the task manager). 

Unfortunately, the MKL libraries aren't going to help you with your memory (RAM) management, which I suspect is why you're getting an error when doing the rlog transformation. Could you give more information about the error? If you already have one 577 by 18,522 cell matrix in the R workspace, I can't imagine that you have much room for another one.   Monitor your memory usage in the task manager next time you try to do the transformation and see if it's at capacity.  If it is indeed at capacity, you can attempt to better manage which objects you maintain in the R environment with the rm() and gc() functions.  rm() will remove an object, which you specify by name as a single argument, from the R environment, and gc() will ensure that R returns unused memory to the operating system for subsequent calculations. You might also go through your code and make sure that you're not generating too many redundant objects to begin with (if you're like me, you have a lot of them).  My current windows installation has 128GB of RAM, and even with all that I've still had to remove certain objects to make room for others (which is admittedly mostly due to my sloppy programming and not the system's fault). 

If you still don't have the RAM to run your analysis, I'd recommend simply installing more if your board will support it.  
 

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Joseph Bundy20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 156 users visited in the last hour