DEseq2: Is vst/rlog transformation applied to raw counts or normalised counts?
1
2
Entering edit mode
salamandra ▴ 20
@salamandra-12825
Last seen 3.0 years ago
Portugal

Hi,

When applying vst or rlog transformations to rna-seq data to latter on visualize it in a heatmap with dendrograms, is vst/rlog applied to raw counts or instead to normalized counts when we do assay(rlog(ddsHTSeq, blind=F)) ?

By normalized counts I mean corrected with scaling factor calculated with DEseq() command.

In case it's applied to raw counts, shouldn't we correct vst/rlog values someway for sequencing depth and average gene expression across samples before plotting heatmap and dendrograms?

Best

 

deseq2 vst rlog transformation • 8.5k views
ADD COMMENT
0
Entering edit mode

I’m out of the office for winter break, but will reply when I’m back.

ADD REPLY
2
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…

Have a look at the manual pages of these functions. The first sentence of that for varianceStabilizingTransformation says "This function calculates a variance stabilizing transformation (VST) from the fitted dispersion-mean relation(s) and then transforms the count data (normalized by division by the size factors or normalization factors)." For rlog, it says "This function transforms the count data to the log2 scale in a way which minimizes differences between samples for rows with small counts, and which normalizes with respect to library size."

Do try to read the documentation and a little bit about the underlying methods, you'll find that you'll be more efficient and have much more fun with the software.

ADD COMMENT
0
Entering edit mode

Honestly even after reading manual pages I don't understand cause of the statistical terms. I'm trying to learn more on that area though.

So, it means vst/rlog 'correct' for library size, but still dind't get if they correct for average gene expression across samples.

And also, in this case should we use 'blind=F' or 'blind=T'?

ADD REPLY
1
Entering edit mode

They are roughly log2(normalized counts) but with variance stabilization.  If you were asking if they are mean centered then the answer is no.

ADD REPLY
0
Entering edit mode

Thanks. So, counts are not scalled by row/mean gene expression (unlike the normalized counts for differential expression analysis)? Is it ok to subtract rlog values by log of mean expression of each gene before doing dendrogram, then?

ADD REPLY
1
Entering edit mode

It’s up to you. If you want to remove the mean across samples, which is useful sometimes, then you can do this. It’s an option in all the heat map programs.

ADD REPLY

Login before adding your answer.

Traffic: 526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6