Question: rlog fast=TRUE not correct?
0
gravatar for ldetorrente
3.3 years ago by
ldetorrente10
ldetorrente10 wrote:

Hi, they finally updated the R on our cluster so I am now using DESeq2_1.10.1 with R 3.2.2.. However, I was surprised to see that the option "fast" in rlog was not available anymore, is there a reason why? Was the approximation not good/correct? I tried to run rlog in v1.10.1 but it's taking ages (3 days and it's not done). I have 683 samples and I know that, in the vignette, it is written that with more than 100 samples it is better to use the vst function. However, the library sizes are very different so I think that rlog would be a better choice in my case.  If the approximation was correct I am thinking of taking the raw code from the old package and run just rlog with that on my dataset. Do you have other suggestions?

 

 

 

 

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by ldetorrente10
Answer: rlog fast=TRUE not correct?
0
gravatar for Michael Love
3.3 years ago by
Michael Love25k
United States
Michael Love25k wrote:

hi,

Yes the reason is that the approximation was not necessarily a good one, and we wanted to have users move to the vst() instead if there are 100s of samples. We didn't come up with a good solution for something close to the rlog in the case of 100s of samples.

How far apart are the library sizes? What is quantile(sizeFactors(dds), 0:10/10)? Are the library sizes confounded with the treatment or condition?

 

ADD COMMENTlink written 3.3 years ago by Michael Love25k
Answer: rlog fast=TRUE not correct?
0
gravatar for ldetorrente
3.3 years ago by
ldetorrente10
ldetorrente10 wrote:

Here are the results that you asked. Tissue_Type is my condition with 4 levels and I actually have 495 samples total (I am working on a subset at the moment). When I do an ANOVA to see the association between the size and the Tissue_Type, I have a significant p-value like you can see below. 

quantile(sizeFactors(dds), 0:10/10)

         0%         10%         20%         30%         40%         50%

0.004344844 0.225409822 0.557916071 0.798786169 1.027512508 1.368807900

        60%         70%         80%         90%        100%

1.668758846 1.996492036 2.453239510 3.308225474 8.874847256

summary(aov(sizeFactors(dds)~Tissue_Type))[[1]][["Pr(>F)"]][[1]]

[1] 3.366477e-07

 

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by ldetorrente10

I would remove the samples with very low size factor, e.g. the minimum one here. These are essentially failed experiments, which can be identified by sample QA (e.g. FASTQC). These samples will throw off exploratory plots like PCA no matter how you transform. 

It's not very good for exploratory plots like PCA or for differential expression testing if a technical factor such as the sequencing depth is confounded with the tissue type. I know you can't help this after the fact but it's good information to pass along and keep in mind when designing experiments.

I would suggest to use vst() which is in the current version of DESeq2 (1.12), after removing the sample with very low sequencing depth. You can compare with simple log transform of normalized counts with a large pseudocount, e.g. 10, using the function normTransform().

I see that your system administrators updated to R 3.2.2 but this is no longer current. If I were you I would ask them to offer the current version of R, which is 3.3.

ADD REPLYlink written 3.3 years ago by Michael Love25k

Note that you can add a comment to an existing post using the Add Comment / Add Reply buttons. The Add Answer text box at the bottom is for posting an answer to the original post (your original question).

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by Michael Love25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 304 users visited in the last hour