Hi, I am using the DESeq2 (DESeq2_1.22.2) VST algorithm to normalize the tag count within peaks from CAGE-seq data. I want to use the VST transformed counts in peaks to see the change of peak activity across cell lines and to determine the cell line-specific peaks. I want to "normalize counts" across samples for cross-sample comparison of peak activity and want to have "normalized counts per million" to determine cell-line specific peaks which are >1 TPM.
I thought the VST transformed read count was the right way to go because the VST considers the size factor/dispersion to normalize the count and the unit of VST transformed read count is "count-per-million" (according to the post by Ryan C. Thompson ub https://support.bioconductor.org/p/65510/).
However, when I added all VST normalized peak count per cells, the sum values were in the range of 10-20 million, which is 10-20 times larger than my expectation.
Here is my questions. 1) Is the unit of VST normalized peak count "count-per-million"? If then, what are possible explanation for my 10-20 million VST transformed read count per cell/ 2) What is the pseudocount used in VST? In the DEseq2 document, I couldn't find the pseudocount for VST. Is there no pseudocount for VST?
Best regards, Ju Heon Maeng