Question

DEseq2: Is vst/rlog transformation applied to raw counts or normalised counts?

2

Entering edit mode

salamandra ▴ 20

@salamandra-12825

Last seen 4.1 years ago

Portugal

Hi,

When applying vst or rlog transformations to rna-seq data to latter on visualize it in a heatmap with dendrograms, is vst/rlog applied to raw counts or instead to normalized counts when we do assay(rlog(ddsHTSeq, blind=F)) ?

By normalized counts I mean corrected with scaling factor calculated with DEseq() command.

In case it's applied to raw counts, shouldn't we correct vst/rlog values someway for sequencing depth and average gene expression across samples before plotting heatmap and dendrograms?

Best

deseq2 vst rlog transformation • 9.5k views

ADD COMMENT • link updated 7.2 years ago by Wolfgang Huber ★ 13k • written 7.2 years ago by salamandra ▴ 20

0

Entering edit mode

I’m out of the office for winter break, but will reply when I’m back.

ADD REPLY • link 7.2 years ago Michael Love 43k

score 2 · Answer 1 · 2019-01-01

2

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 5 months ago

EMBL European Molecular Biology Laborat…

Have a look at the manual pages of these functions. The first sentence of that for varianceStabilizingTransformation says "This function calculates a variance stabilizing transformation (VST) from the fitted dispersion-mean relation(s) and then transforms the count data (normalized by division by the size factors or normalization factors)." For rlog, it says "This function transforms the count data to the log2 scale in a way which minimizes differences between samples for rows with small counts, and which normalizes with respect to library size."

Do try to read the documentation and a little bit about the underlying methods, you'll find that you'll be more efficient and have much more fun with the software.

ADD COMMENT • link 7.2 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Honestly even after reading manual pages I don't understand cause of the statistical terms. I'm trying to learn more on that area though.

So, it means vst/rlog 'correct' for library size, but still dind't get if they correct for average gene expression across samples.

And also, in this case should we use 'blind=F' or 'blind=T'?

ADD REPLY • link 7.1 years ago salamandra ▴ 20

1

Entering edit mode

They are roughly log2(normalized counts) but with variance stabilization. If you were asking if they are mean centered then the answer is no.

ADD REPLY • link 7.1 years ago Michael Love 43k

0

Entering edit mode

Thanks. So, counts are not scalled by row/mean gene expression (unlike the normalized counts for differential expression analysis)? Is it ok to subtract rlog values by log of mean expression of each gene before doing dendrogram, then?

ADD REPLY • link 7.1 years ago salamandra ▴ 20

1

Entering edit mode

It’s up to you. If you want to remove the mean across samples, which is useful sometimes, then you can do this. It’s an option in all the heat map programs.

ADD REPLY • link 7.1 years ago Michael Love 43k