Search
Question: Deseq2 results discrepancies because of versions of R
0
gravatar for genomica
19 months ago by
genomica0
United Kingdom
genomica0 wrote:

Hello all,

I was asked to repeat RNAseq analysis done a while ago. Previously, analysis was done using Deseq2 v1.2.10 and R v.3.0.2. Although we are still using the same version for Deseq2, R on our system is now upgraded to 3.2.1. Repeating exactly same analysis using the same code but with newer version of R, gives me half the number of genes as significantly expressed. 

Does anyone know why this discrepancy could be?

Many thanks for any help you can provide.

ADD COMMENTlink modified 19 months ago by Michael Love12k • written 19 months ago by genomica0
2
gravatar for Michael Love
19 months ago by
Michael Love12k
United States
Michael Love12k wrote:

Version 1.0 and 1.2 were the first 2 releases of DESeq2, in April and October 2013. We received a lot of feedback from users and so v1.4 incorporated those changes. Since v1.4, the methods for parameter estimation have remained much more stable, which leads to more stable list of DEG. Since v1.4 we focus on usability and extra features, but not on changing methods for parameter estimation. All important changes are documented in the NEWS file accompanying the package (although this might not be useful as it's hard to tell which changes are the ones that contributed to a difference). 

~~~

General advice on why numbers of DEG change with small changes in parameters:

For more detail on how the number of DEG could have changed, it's important to remember that p-values are tail probabilities. This makes them especially sensitive to change in how you estimate parameters. 

For an example of how tail probabilities are sensitive, suppose we observe 10,000 random variables from a standard normal distribution. How many of these are in the lower 1% tail of the distribution we drew these from?

> set.seed(1)
> z <- rnorm(10000, 0, 1)
> sum(pnorm(z, 0, 1) < .01)
[1] 115

We expect this to be around 100. Note there is already variance because Z is a random sample.

Now, suppose we make a 5% change in the standard deviation of the distribution we use to evaluate Z:

> sum(pnorm(z, 0, 0.95) < .01)
[1] 167

After a 5% change in a parameter -- here, underestimating the standard deviation -- we observe a 45% increase in the number of random variables in the lower tail. 

What if we increase the parameter by 5%:

> sum(pnorm(z, 0, 1.05) < .01)
[1] 83

This gives a 28% decrease in the number in the lower tail.

My point with this example is that small changes in parameters are amplified when we compute the number of observations (for differential expression, read: number of genes), that fall into the tails.

ADD COMMENTlink modified 16 months ago • written 19 months ago by Michael Love12k

Thank a lot for your explanation, Michael. Really helpful! 

ADD REPLYlink written 19 months ago by genomica0
0
gravatar for genomica
19 months ago by
genomica0
United Kingdom
genomica0 wrote:

Please ignore the previous post. It was my bad in noting the versions of Deseq2. Its v1.2.10 for R3.0.2 and v1.10 for R3.2.0. Sorry!

ADD COMMENTlink written 19 months ago by genomica0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 163 users visited in the last hour