Question

DESeq2 - v1.18 different p-values from v1.20 (Is it minmu?)

0

Entering edit mode

andrebolerbarros ▴ 20

@andrebolerbarros-16788

Last seen 1 hour ago

Portugal

Hello everyone,

I ran a RNASeq pipeline analysis previously with DESeq2 version 1.18; now, with version 1.20, I am getting different p-values that, ultimately influences the number of DE genes (~200 different genes).

I was reading the changes about the changes in versions and the only change I think could be involved is the "minmu" argument of DESeq, which was not present in 1.18. Can this be the answer?

Thanks!

deseq2 • 1.4k views

ADD COMMENT • link updated 5.7 years ago by Michael Love 41k • written 5.7 years ago by andrebolerbarros ▴ 20

1

Entering edit mode

Have a look here, there is a reference to that parameter.

https://github.com/mikelove/DESeq2/blob/a24c0bd71fb4a621a7c0772ca00825db5af5c69b/NEWS#L26

ADD REPLY • link 5.7 years ago Federico Marini ▴ 150

0

Entering edit mode

Thanks for your answer! I saw that, that's how I got to the hypothesis of minmu. But, I would like to understand the change or to get the default parameter in v1.18

ADD REPLY • link 5.7 years ago andrebolerbarros ▴ 20

0

Entering edit mode

Just started doing some experiments and it seems to be "minmu" indeed that is changing this values. Is there any information about the previous value and/or any information concerning the change?

ADD REPLY • link 5.7 years ago andrebolerbarros ▴ 20

score 0 · Answer 1 · 2018-08-05

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 55 minutes ago

United States

I didn’t change the value of minmu (it was 0.5 before and after), I only elevated it to an argument, rather than an internal parameter. It was exposed for the single cell integration.

So that’s not the cause of any changes in pvalues. I can’t think of any difference between these versions. Can you give summary(abs(res$stat - res2$stat)) ?

ADD COMMENT • link 5.7 years ago Michael Love 41k

0

Entering edit mode

I happen to have both versions on my laptop, and I get at most 2 x 10^-5 differences in adjusted p-value. I don't think there was any relevant change in the statistical routine between these versions (there was a bug in the single cell integration that I fixed, but you didn't mention using ZINB-WaVE estimated weights).

Is it possible you were using a version earlier than 1.18?

R 3.4:

> packageVersion("DESeq2")
[1] ‘1.18.1’
> set.seed(1)
> dds <- makeExampleDESeqDataSet()
> dds <- DESeq(dds, quiet=TRUE)
> res <- results(dds)
> save(dds, res, file="deseq2_v1.18.rda")

R 3.5:

> packageVersion("DESeq2")
[1] ‘1.20.0’
> load("deseq2_v1.18.rda")
> dds <- DESeq(dds, quiet=TRUE)
> res2 <- results(dds)
> summary(abs(res$stat - res2$stat))
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
0.00e+00 6.00e-07 2.00e-06 3.00e-06 4.30e-06 2.23e-05        2 
> summary(abs(res$pvalue - res2$pvalue))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
0.0e+00 3.0e-07 1.0e-06 1.2e-06 1.8e-06 5.3e-06       3 
> summary(abs(res$padj - res2$padj))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
0.0e+00 2.0e-07 2.0e-07 5.0e-07 2.0e-07 2.3e-05       3

ADD REPLY • link 5.7 years ago Michael Love 41k

0

Entering edit mode

So, I did what you suggested (with some additions). I had the environment made with v1.18, so I performed the v1.20 analysis and then compared.

First, by comparing the values of the columns:

check1<-vector()
for (i in 1:ncol(res_20)) {
+   check1[i]<-all(res_18[,i]==res_20[,i],na.rm = T)
+ }
 
check1
[1]  TRUE FALSE FALSE FALSE FALSE FALSE

Now, as suggested, I performed the difference between the results

(First of all, check if the rownames are exactly matched)

check2<-all(rownames(res_18) == rownames(res_20))
check2
[1] TRUE

Now, the difference:

for (i in 1:nrow(res_18)) {
+   dif[i,]<-res_20[i,]-res_18[i,]
+ }

dif<-dif[order(dif$padj,decreasing = T),]
head(dif)
                   baseMean log2FoldChange        lfcSE          stat       pvalue
ENSMUSG00000096992        0   5.169783e-04 1.687938e-03  0.0040201034 2.019885e-03
ENSMUSG00000029190        0   5.193889e-08 2.530548e-05 -0.0001695293 9.630470e-05
ENSMUSG00000112846        0  -5.263351e-06 1.690702e-04  0.0001355298 7.715936e-05
ENSMUSG00000083431        0   5.882076e-05 3.126852e-04  0.0002190303 1.229127e-04
ENSMUSG00000081093        0  -9.029549e-06 1.883606e-04 -0.0001652798 9.436838e-05
ENSMUSG00000093405        0  -1.054840e-05 1.083485e-04 -0.0001479438 8.400461e-05
                         padj
ENSMUSG00000096992 0.04924934
ENSMUSG00000029190 0.04891005
ENSMUSG00000112846 0.04887932
ENSMUSG00000083431 0.04887640
ENSMUSG00000081093 0.04887206
ENSMUSG00000093405 0.04886462

As you can see, there is big differences in the adjusted p-value, which can have clear differences between the versions.

Next step is to re-do everything as you did (R 3.4; DESeq2 v1.18 - version I used the first time) and then compare to the current version I have (R 3.5; DESeq2 v1.20)

ADD REPLY • link 5.7 years ago andrebolerbarros ▴ 20

0

Entering edit mode

I just re-did the analysis with different versions (R3.4; DESeq2 v1.18.1 versus R3.5; DESeq2 v.1.20.0) and the difference remains

ADD REPLY • link 5.7 years ago andrebolerbarros ▴ 20

0

Entering edit mode

As you requested:

summary(dif$stat)
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-4.576e-02 -8.796e-05  0.000e+00 -1.383e-05  6.629e-05  5.963e-02

and also:

summary(dif$padj)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   0.000   0.009   0.017   0.034   0.049    9788

ADD REPLY • link 5.7 years ago andrebolerbarros ▴ 20

0

Entering edit mode

Note the pvalues are only different by at most .002 right? I’m surprised this aggregates to such a big difference in adjusted pvalues, but it’s possible because of the nature of the method.

Moving forward, the results are not supposed to be identical across versions. Unless there was a regression, which I don’t think there was, why don’t you stick with one version for the analysis?

ADD REPLY • link 5.7 years ago Michael Love 41k