rlog fast=TRUE not correct?
2
0
Entering edit mode
ldetorrente ▴ 10
@ldetorrente-10730
Last seen 5.7 years ago

Hi, they finally updated the R on our cluster so I am now using DESeq2_1.10.1 with R 3.2.2.. However, I was surprised to see that the option "fast" in rlog was not available anymore, is there a reason why? Was the approximation not good/correct? I tried to run rlog in v1.10.1 but it's taking ages (3 days and it's not done). I have 683 samples and I know that, in the vignette, it is written that with more than 100 samples it is better to use the vst function. However, the library sizes are very different so I think that rlog would be a better choice in my case.  If the approximation was correct I am thinking of taking the raw code from the old package and run just rlog with that on my dataset. Do you have other suggestions?

deseq2 rlog transformation vst • 816 views
0
Entering edit mode
@mikelove
Last seen 4 hours ago
United States

hi,

Yes the reason is that the approximation was not necessarily a good one, and we wanted to have users move to the vst() instead if there are 100s of samples. We didn't come up with a good solution for something close to the rlog in the case of 100s of samples.

How far apart are the library sizes? What is quantile(sizeFactors(dds), 0:10/10)? Are the library sizes confounded with the treatment or condition?

0
Entering edit mode
ldetorrente ▴ 10
@ldetorrente-10730
Last seen 5.7 years ago

Here are the results that you asked. Tissue_Type is my condition with 4 levels and I actually have 495 samples total (I am working on a subset at the moment). When I do an ANOVA to see the association between the size and the Tissue_Type, I have a significant p-value like you can see below.

quantile(sizeFactors(dds), 0:10/10)

         0%         10%         20%         30%         40%         50%

0.004344844 0.225409822 0.557916071 0.798786169 1.027512508 1.368807900

        60%         70%         80%         90%        100%

1.668758846 1.996492036 2.453239510 3.308225474 8.874847256

summary(aov(sizeFactors(dds)~Tissue_Type))[[1]][["Pr(>F)"]][[1]]

[1] 3.366477e-07

0
Entering edit mode

I would remove the samples with very low size factor, e.g. the minimum one here. These are essentially failed experiments, which can be identified by sample QA (e.g. FASTQC). These samples will throw off exploratory plots like PCA no matter how you transform.

It's not very good for exploratory plots like PCA or for differential expression testing if a technical factor such as the sequencing depth is confounded with the tissue type. I know you can't help this after the fact but it's good information to pass along and keep in mind when designing experiments.

I would suggest to use vst() which is in the current version of DESeq2 (1.12), after removing the sample with very low sequencing depth. You can compare with simple log transform of normalized counts with a large pseudocount, e.g. 10, using the function normTransform().

I see that your system administrators updated to R 3.2.2 but this is no longer current. If I were you I would ask them to offer the current version of R, which is 3.3.

0
Entering edit mode