Hi!

This is more a statistical issue. I got a metabolic pathway dataset that I run through DESeq2 and my question is:

Why some different p-values ended up having an equal p-adjusted:


baseMean    log2FoldChange  lfcSE           stat            pvalue      padj
102.4139421 -2.779259157    0.879336287 -3.160632853    0.001574268 0.165229376
97.44804027 -2.295013504    0.735995467 -3.118244073    0.001819321 0.165229376
224.9746369 -1.223071776    0.400516301 -3.053737822    0.002260095 0.165229376
114.9366424 -1.813971541    0.621330508 -2.919495368    0.003505986 0.165229376
111.8579053 -1.748819203    0.620374587 -2.818972989    0.004817757 0.165229376
83.94472937 -2.147032598    0.771273039 -2.783751654    0.005373416 0.165229376


I do not think is related to the sample size (similar datasets give quite variable - significant results). I do not think there is anything particulary wrong with the results, I just would like to know a bit more about the reason.

Any answer, tip or help would be really appreciate :)

Jesus

DESeq2 Statistics • 119 views
I would recommend migrating this to stats.stackexchange.com because it's more a stats question than a package-specific question.

Look up how the Benjamini-Hochberg adjustments are made. You can do it yourself in Excel.

@mikelove
Last seen 2 days ago
United States

This is a direct consequence of the method of Benjamini and Hochberg. The paper is quite accessible and worth a look:

http://www.jstor.org/stable/2346101

See formula (1) under "False Discovery Rate Controlling Procedure".

All the p-values p_i up to p_k also get rejected at the FDR q. for a visualization of this formula, take a few sorted p-values with adjusted p-value written above:

p <- sort(c(.01,.2,.21,.22,.5,.51,.52,.8,.9))
plot(p,ylim=c(0,1),xlim=c(0,length(p)))


here, k = 4, m=length(p), and q = padj[4].

(Just copying an answer I posted to the Bioc list actually in 2013, to update the formatting.)

Looks quite relevant for my project :) - Thanks!