Question

deseq2 vst transformation

0

Entering edit mode

aa1201 • 0

@aa1201-14044

Last seen 6.6 years ago

Hello All,

I am trying to reproduce an analysis a former college of mine has produced as we are not able to reach him. He normalized some rna seq data using the VST in DESEQ2 using default parameters. However, if i input the same count matrix to the latest deseq2 version i don't get the same normalized values. Moreover, if i use the vst transformation in deseq i get values different from my deseq2 run and my friends run. So the calculations evolved over time I guess. I do admit the numbers are close but not exactly the same which i need for reproduction. What I ask is are there versions of DESEQ2 where there are slight changes to the way the default VST is calculated which can explain this discrepancy in the calculations? Any help is appreciated (if so can does anyone know which versions those changes occur?)

Thanks

deseq2 variancestabilizingtransformation • 2.2k views

ADD COMMENT • link updated 6.6 years ago by Michael Love 41k • written 6.6 years ago by aa1201 • 0

score 0 · Answer 1 · 2017-09-26

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 15 hours ago

United States

Please post all of the R code used and the versions of DESeq2 that were used. This information is key to knowing how to reproduce numerical values exactly. Given the same R code and knowing the version of DESeq2, it's trivial to recreate numerical values exactly because all release candidates of Bioconductor packages are saved online.

On the software side, we keep track of all changes in the NEWS file, which is available on the website or via the news() function within R.

ADD COMMENT • link 6.6 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

Thanks a lot. I will check the news and updates. To be more specific

Let us think we have a count_matrix of m genes and n samples.

In Deseq version 1.28.0, suppose i run this:


cds=newCountDataSet(count_matrix,phenotype)
cds <- estimateSizeFactors( cds )
cds <- estimateDispersions( cds, method="blind" )
vsd_deseq <- getVarianceStabilizedData( cds )

in deseq2 version 1.16.1:


vsd_deseq2=varianceStabilizingTransformation(count_matrix)

Using the above code should I expect vsd_deseq and vsd_deseq2 to be the same but it was not the same. What do you think that gives the difference?

ADD REPLY • link 6.6 years ago aa1201 • 0

0

Entering edit mode

DESeq2 uses the Cox-Reid adjusted likelihood for obtaining the dispersion estimates, which are then used for fitting the trend, which is then used in the VST (the per gene dispersion estimation follows the Cox-Reid adjustment implemented first in the edgeR papers, see the full details of DESeq2 estimation steps and citations to literature in the DESeq2 paper or vignette).

DESeq by default calculated a pooled estimate for dispersion using the means and variances for each group, and then performing an adjustment for bias. You can read over the DESeq paper for details. Also this section describes changes from DESeq to DESeq2.

But these are very different methods for estimating dispersion, and the VST is based on the trend line that goes through these estimates, so it is expected that using two different software packages, you wouldn't get the same transformed data.

ADD REPLY • link 6.6 years ago Michael Love 41k