Question: DESeq2 - v1.18 different p-values from v1.20 (Is it minmu?)
0
13 months ago by
andrebolerbarros10 wrote:

Hello everyone,

I ran a RNASeq pipeline analysis previously with DESeq2 version 1.18; now, with version 1.20, I am getting different p-values that, ultimately influences the number of DE genes (~200 different genes).

I was reading the changes about the changes in versions and the only change I think could be involved is the "minmu" argument of DESeq, which was not present in 1.18. Can this be the answer?

Thanks!

deseq2 • 340 views
modified 13 months ago by Michael Love25k • written 13 months ago by andrebolerbarros10
1

Have a look here, there is a reference to that parameter.

https://github.com/mikelove/DESeq2/blob/a24c0bd71fb4a621a7c0772ca00825db5af5c69b/NEWS#L26

Thanks for your answer! I saw that, that's how I got to the hypothesis of minmu. But, I would like to understand the change or to get the default parameter in v1.18

Just started doing some experiments and it seems to be "minmu" indeed that is changing this values. Is there any information about the previous value and/or any information concerning the change?

Answer: DESeq2 - v1.18 different p-values from v1.20 (Is it minmu?)
0
13 months ago by
Michael Love25k
United States
Michael Love25k wrote:

I didn’t change the value of minmu (it was 0.5 before and after), I only elevated it to an argument, rather than an internal parameter. It was exposed for the single cell integration.

So that’s not the cause of any changes in pvalues. I can’t think of any difference between these versions. Can you give summary(abs(res$stat - res2$stat)) ?

I happen to have both versions on my laptop, and I get at most 2 x 10^-5 differences in adjusted p-value. I don't think there was any relevant change in the statistical routine between these versions (there was a bug in the single cell integration that I fixed, but you didn't mention using ZINB-WaVE estimated weights).

Is it possible you were using a version earlier than 1.18?

R 3.4:

> packageVersion("DESeq2")
[1] ‘1.18.1’
> set.seed(1)
> dds <- makeExampleDESeqDataSet()
> dds <- DESeq(dds, quiet=TRUE)
> res <- results(dds)
> save(dds, res, file="deseq2_v1.18.rda")

R 3.5:

> packageVersion("DESeq2")
[1] ‘1.20.0’
> dds <- DESeq(dds, quiet=TRUE)
> res2 <- results(dds)
> summary(abs(res$stat - res2$stat))
Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's
0.00e+00 6.00e-07 2.00e-06 3.00e-06 4.30e-06 2.23e-05        2
> summary(abs(res$pvalue - res2$pvalue))
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
0.0e+00 3.0e-07 1.0e-06 1.2e-06 1.8e-06 5.3e-06       3
> summary(abs(res$padj - res2$padj))
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
0.0e+00 2.0e-07 2.0e-07 5.0e-07 2.0e-07 2.3e-05       3 

So, I did what you suggested (with some additions). I had the environment made with v1.18, so I performed the v1.20 analysis and then compared.

First, by comparing the values of the columns:

check1<-vector()
for (i in 1:ncol(res_20)) {
+   check1[i]<-all(res_18[,i]==res_20[,i],na.rm = T)
+ }

check1
[1]  TRUE FALSE FALSE FALSE FALSE FALSE

Now, as suggested, I performed the difference between the results

(First of all, check if the rownames are exactly matched)

check2<-all(rownames(res_18) == rownames(res_20))
check2
[1] TRUE

Now, the difference:

for (i in 1:nrow(res_18)) {
+   dif[i,]<-res_20[i,]-res_18[i,]
+ }

dif<-dif[order(dif$padj,decreasing = T),] head(dif) baseMean log2FoldChange lfcSE stat pvalue ENSMUSG00000096992 0 5.169783e-04 1.687938e-03 0.0040201034 2.019885e-03 ENSMUSG00000029190 0 5.193889e-08 2.530548e-05 -0.0001695293 9.630470e-05 ENSMUSG00000112846 0 -5.263351e-06 1.690702e-04 0.0001355298 7.715936e-05 ENSMUSG00000083431 0 5.882076e-05 3.126852e-04 0.0002190303 1.229127e-04 ENSMUSG00000081093 0 -9.029549e-06 1.883606e-04 -0.0001652798 9.436838e-05 ENSMUSG00000093405 0 -1.054840e-05 1.083485e-04 -0.0001479438 8.400461e-05 padj ENSMUSG00000096992 0.04924934 ENSMUSG00000029190 0.04891005 ENSMUSG00000112846 0.04887932 ENSMUSG00000083431 0.04887640 ENSMUSG00000081093 0.04887206 ENSMUSG00000093405 0.04886462 As you can see, there is big differences in the adjusted p-value, which can have clear differences between the versions. Next step is to re-do everything as you did (R 3.4; DESeq2 v1.18 - version I used the first time) and then compare to the current version I have (R 3.5; DESeq2 v1.20) ADD REPLYlink written 13 months ago by andrebolerbarros10 I just re-did the analysis with different versions (R3.4; DESeq2 v1.18.1 versus R3.5; DESeq2 v.1.20.0) and the difference remains ADD REPLYlink written 13 months ago by andrebolerbarros10 As you requested: summary(dif$stat)
Min.    1st Qu.     Median       Mean    3rd Qu.       Max.
-4.576e-02 -8.796e-05  0.000e+00 -1.383e-05  6.629e-05  5.963e-02 

and also:

summary(dif\$padj)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
0.000   0.000   0.009   0.017   0.034   0.049    9788 

Note the pvalues are only different by at most .002 right? I’m surprised this aggregates to such a big difference in adjusted pvalues, but it’s possible because of the nature of the method.

Moving forward, the results are not supposed to be identical across versions. Unless there was a regression, which I don’t think there was, why don’t you stick with one version for the analysis?