Question

DESeq2: What is the unit of DESeq2 normalized read count (VST)? Is it tag per million?

0

Entering edit mode

juheon.maeng • 0

@juheonmaeng-22220

Last seen 5.7 years ago

Hi, I am using the DESeq2 (DESeq2_1.22.2) VST algorithm to normalize the tag count within peaks from CAGE-seq data. I want to use the VST transformed counts in peaks to see the change of peak activity across cell lines and to determine the cell line-specific peaks. I want to "normalize counts" across samples for cross-sample comparison of peak activity and want to have "normalized counts per million" to determine cell-line specific peaks which are >1 TPM.

I thought the VST transformed read count was the right way to go because the VST considers the size factor/dispersion to normalize the count and the unit of VST transformed read count is "count-per-million" (according to the post by Ryan C. Thompson ub https://support.bioconductor.org/p/65510/).

However, when I added all VST normalized peak count per cells, the sum values were in the range of 10-20 million, which is 10-20 times larger than my expectation.

Here is my questions. 1) Is the unit of VST normalized peak count "count-per-million"? If then, what are possible explanation for my 10-20 million VST transformed read count per cell/ 2) What is the pseudocount used in VST? In the DEseq2 document, I couldn't find the pseudocount for VST. Is there no pseudocount for VST?

Best regards, Ju Heon Maeng

deseq2 vst rlog • 4.1k views

ADD COMMENT • link updated 5.7 years ago by Michael Love 43k • written 5.7 years ago by juheon.maeng • 0

score 0 · Answer 1 · 2019-10-25

VST is approximately log2 of scaled counts (as the counts become larger it converges to this).

So it's not CPM or anything like this, but counts which are scaled to the middle range of sequencing depth in your dataset. So it's the log2 of a count, if that sample was sequenced in the middle range in terms of depth.

There is no pseudocount used in VST. See the DESeq (2010) paper for description of the transformation.