Question: DEseq2: Is vst/rlog transformation applied to raw counts or normalised counts?
0
gravatar for salamandra
8 months ago by
salamandra0
salamandra0 wrote:

Hi,

When applying vst or rlog transformations to rna-seq data to latter on visualize it in a heatmap with dendrograms, is vst/rlog applied to raw counts or instead to normalized counts when we do assay(rlog(ddsHTSeq, blind=F)) ?

By normalized counts I mean corrected with scaling factor calculated with DEseq() command.

In case it's applied to raw counts, shouldn't we correct vst/rlog values someway for sequencing depth and average gene expression across samples before plotting heatmap and dendrograms?

Best

 

ADD COMMENTlink modified 8 months ago by Wolfgang Huber13k • written 8 months ago by salamandra0

I’m out of the office for winter break, but will reply when I’m back.

ADD REPLYlink written 8 months ago by Michael Love25k
Answer: DEseq2: Is vst/rlog transformation applied to raw counts or normalised counts?
2
gravatar for Wolfgang Huber
8 months ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:

Have a look at the manual pages of these functions. The first sentence of that for varianceStabilizingTransformation says "This function calculates a variance stabilizing transformation (VST) from the fitted dispersion-mean relation(s) and then transforms the count data (normalized by division by the size factors or normalization factors)." For rlog, it says "This function transforms the count data to the log2 scale in a way which minimizes differences between samples for rows with small counts, and which normalizes with respect to library size."

Do try to read the documentation and a little bit about the underlying methods, you'll find that you'll be more efficient and have much more fun with the software.

ADD COMMENTlink written 8 months ago by Wolfgang Huber13k

Honestly even after reading manual pages I don't understand cause of the statistical terms. I'm trying to learn more on that area though.

So, it means vst/rlog 'correct' for library size, but still dind't get if they correct for average gene expression across samples.

And also, in this case should we use 'blind=F' or 'blind=T'?

ADD REPLYlink modified 8 months ago • written 8 months ago by salamandra0
1

They are roughly log2(normalized counts) but with variance stabilization.  If you were asking if they are mean centered then the answer is no.

ADD REPLYlink written 8 months ago by Michael Love25k

Thanks. So, counts are not scalled by row/mean gene expression (unlike the normalized counts for differential expression analysis)? Is it ok to subtract rlog values by log of mean expression of each gene before doing dendrogram, then?

ADD REPLYlink written 8 months ago by salamandra0
1

It’s up to you. If you want to remove the mean across samples, which is useful sometimes, then you can do this. It’s an option in all the heat map programs.

ADD REPLYlink written 8 months ago by Michael Love25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 288 users visited in the last hour