Question: deseq2 vst transformation
0
gravatar for aa1201
20 months ago by
aa12010
aa12010 wrote:

Hello All,

 

I am trying to reproduce an analysis a former college of mine has produced as we are not able to reach him. He normalized some rna seq data using the VST in DESEQ2 using default parameters. However, if i input the same count matrix to the latest deseq2 version i don't get the same normalized values. Moreover, if i use the vst transformation in deseq i get values different from my deseq2 run and my friends run. So the calculations evolved over time I guess. I do admit the numbers are close but not exactly the same which i need for reproduction. What I ask is are there versions of DESEQ2 where there are slight changes to the way the default VST is calculated which can explain this discrepancy in the calculations? Any help is appreciated (if so can does anyone know which versions those changes occur?)

 

Thanks

ADD COMMENTlink modified 20 months ago by Michael Love23k • written 20 months ago by aa12010
Answer: deseq2 vst transformation
0
gravatar for Michael Love
20 months ago by
Michael Love23k
United States
Michael Love23k wrote:

Please post all of the R code used and the versions of DESeq2 that were used. This information is key to knowing how to reproduce numerical values exactly. Given the same R code and knowing the version of DESeq2, it's trivial to recreate numerical values exactly because all release candidates of Bioconductor packages are saved online.

On the software side, we keep track of all changes in the NEWS file, which is available on the website or via the news() function within R.

ADD COMMENTlink written 20 months ago by Michael Love23k

 

Hi Michael, 

Thanks a lot. I will check the news and updates. To be more specific

Let us think we have a count_matrix of m genes and n samples. 

In Deseq version 1.28.0, suppose i run this:


cds=newCountDataSet(count_matrix,phenotype)
cds <- estimateSizeFactors( cds )
cds <- estimateDispersions( cds, method="blind" )
vsd_deseq <- getVarianceStabilizedData( cds )

in deseq2 version 1.16.1:


vsd_deseq2=varianceStabilizingTransformation(count_matrix)

Using the above code should I expect vsd_deseq and vsd_deseq2 to be the same but it was not the same. What do you think that gives the difference?

 

ADD REPLYlink modified 20 months ago • written 20 months ago by aa12010

DESeq2 uses the Cox-Reid adjusted likelihood for obtaining the dispersion estimates, which are then used for fitting the trend, which is then used in the VST (the per gene dispersion estimation follows the Cox-Reid adjustment implemented first in the edgeR papers, see the full details of DESeq2 estimation steps and citations to literature in the DESeq2 paper or vignette).

DESeq by default calculated a pooled estimate for dispersion using the means and variances for each group, and then performing an adjustment for bias. You can read over the DESeq paper for details. Also this section describes changes from DESeq to DESeq2.

But these are very different methods for estimating dispersion, and the VST is based on the trend line that goes through these estimates, so it is expected that using two different software packages, you wouldn't get the same transformed data.

ADD REPLYlink modified 20 months ago • written 20 months ago by Michael Love23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 301 users visited in the last hour