Hello All,
I am trying to reproduce an analysis a former college of mine has produced as we are not able to reach him. He normalized some rna seq data using the VST in DESEQ2 using default parameters. However, if i input the same count matrix to the latest deseq2 version i don't get the same normalized values. Moreover, if i use the vst transformation in deseq i get values different from my deseq2 run and my friends run. So the calculations evolved over time I guess. I do admit the numbers are close but not exactly the same which i need for reproduction. What I ask is are there versions of DESEQ2 where there are slight changes to the way the default VST is calculated which can explain this discrepancy in the calculations? Any help is appreciated (if so can does anyone know which versions those changes occur?)
Thanks
Hi Michael,
Thanks a lot. I will check the news and updates. To be more specific
Let us think we have a count_matrix of m genes and n samples.
In Deseq version 1.28.0, suppose i run this:
in deseq2 version 1.16.1:
Using the above code should I expect vsd_deseq and vsd_deseq2 to be the same but it was not the same. What do you think that gives the difference?
DESeq2 uses the Cox-Reid adjusted likelihood for obtaining the dispersion estimates, which are then used for fitting the trend, which is then used in the VST (the per gene dispersion estimation follows the Cox-Reid adjustment implemented first in the edgeR papers, see the full details of DESeq2 estimation steps and citations to literature in the DESeq2 paper or vignette).
DESeq by default calculated a pooled estimate for dispersion using the means and variances for each group, and then performing an adjustment for bias. You can read over the DESeq paper for details. Also this section describes changes from DESeq to DESeq2.
But these are very different methods for estimating dispersion, and the VST is based on the trend line that goes through these estimates, so it is expected that using two different software packages, you wouldn't get the same transformed data.