Question: Deseq2 results discrepancies because of versions of R
gravatar for genomica
22 months ago by
United Kingdom
genomica0 wrote:

Hello all,

I was asked to repeat RNAseq analysis done a while ago. Previously, analysis was done using Deseq2 v1.2.10 and R v.3.0.2. Although we are still using the same version for Deseq2, R on our system is now upgraded to 3.2.1. Repeating exactly same analysis using the same code but with newer version of R, gives me half the number of genes as significantly expressed. 

Does anyone know why this discrepancy could be?

Many thanks for any help you can provide.

ADD COMMENTlink modified 22 months ago by Michael Love13k • written 22 months ago by genomica0
gravatar for Michael Love
22 months ago by
Michael Love13k
United States
Michael Love13k wrote:

Version 1.0 and 1.2 were the first 2 releases of DESeq2, in April and October 2013. We received a lot of feedback from users and so v1.4 incorporated those changes. Since v1.4, the methods for parameter estimation have remained much more stable, which leads to more stable list of DEG. Since v1.4 we focus on usability and extra features, but not on changing methods for parameter estimation. All important changes are documented in the NEWS file accompanying the package (although this might not be useful as it's hard to tell which changes are the ones that contributed to a difference). 


General advice on why numbers of DEG change with small changes in parameters:

For more detail on how the number of DEG could have changed, it's important to remember that p-values are tail probabilities. This makes them especially sensitive to change in how you estimate parameters. 

For an example of how tail probabilities are sensitive, suppose we observe 10,000 random variables from a standard normal distribution. How many of these are in the lower 1% tail of the distribution we drew these from?

> set.seed(1)
> z <- rnorm(10000, 0, 1)
> sum(pnorm(z, 0, 1) < .01)
[1] 115

We expect this to be around 100. Note there is already variance because Z is a random sample.

Now, suppose we make a 5% change in the standard deviation of the distribution we use to evaluate Z:

> sum(pnorm(z, 0, 0.95) < .01)
[1] 167

After a 5% change in a parameter -- here, underestimating the standard deviation -- we observe a 45% increase in the number of random variables in the lower tail. 

What if we increase the parameter by 5%:

> sum(pnorm(z, 0, 1.05) < .01)
[1] 83

This gives a 28% decrease in the number in the lower tail.

My point with this example is that small changes in parameters are amplified when we compute the number of observations (for differential expression, read: number of genes), that fall into the tails.

ADD COMMENTlink modified 19 months ago • written 22 months ago by Michael Love13k

Thank a lot for your explanation, Michael. Really helpful! 

ADD REPLYlink written 22 months ago by genomica0
gravatar for genomica
22 months ago by
United Kingdom
genomica0 wrote:

Please ignore the previous post. It was my bad in noting the versions of Deseq2. Its v1.2.10 for R3.0.2 and v1.10 for R3.2.0. Sorry!

ADD COMMENTlink written 22 months ago by genomica0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 140 users visited in the last hour