Is it valid to do gene comparisons on VST or rlog transformed data?
1
1
Entering edit mode
JB ▴ 10
@2b6628e8
Last seen 4 weeks ago
United States

Hi,

I'm wondering if it is valid to do comparisons, such as a t-test for a gene between different treatment groups, on VST or rlog transformed data. While I know that differential expression is best done with the deseq function, and that VST or rlog data is supposed to be for clustering or visualization, I am exploring different ways of grouping treatments as well as different ways to compare gene expression, so it is easier for me to work with a table of transformed counts.

However, some things I've read say this is not recommended, but I haven't seen a clear explanation of why not. If VST or rlog is not similar to the transformation used within deseq when it does the differential expression (sorry, I didn't fully understand the differences when I read the paper, especially when blind = FALSE), then is there a way to get the count table that deseq uses under the hood (that has been both size normalized and has dispersions estimated, etc.)?

VST rlog • 116 views
1
Entering edit mode
@mikelove
Last seen 3 days ago
United States

Sorry I missed this because I only follow package tags.

Just to start, I recommend vst() over rlog().

If you have large sample size (e.g. > 20 replicates per group), I would think vst() plus linear model would be comparable to the DESeq2 model. If you have moderate or small sample size, then precision of individual observations really matters. vst() output doesn't tell you about the precision of the expression values, while running DESeq() definitely takes this into account.