Dear Community,
I have a question about the p-values reported by DESeq2. I performed differential expression analysis between cases and controls. The raw p-value reported for gene A with the following FPKM values is only 1.95E-04: case group: 1.95, 1.84, 0, 0, 0, 0, 0.01, 0.28, 0, 0.01, 0, 0, 0 control group: 63.23, 81.76, 75.39, 57.81, 44.48, 67.62, 51.98, 38.09, 80.06, 46.84, 90.77, 81.71, 64.62, 74.59
But the raw p-value for gene B with the following FPKM values is 9.97E-30, which is much more significant: case group: 19.33, 28.04, 23.6, 24.74, 23.5, 24.75, 17.92, 23.05, 16.72, 22.5, 25.94, 19.36, 20.3 control group: 38.71, 37.73, 36.04, 36.44, 53.25, 35.3, 34.58, 33.22, 46.12, 34.23, 43.95, 38.55, 35.11, 44.82
I know DESeq2 takes raw read count as input and I did use read counts for differential expression analysis. However, the normalized count by DESeq2 for these two genes follow the same pattern as the FPKM values.
What I don't understand is gene A should have much more significant p-value than gene B as gene A has almost no expression in cases, but apparently DESeq2 didn't report this way. Using shrinkage or not doesn't seem to matter as I tried both.
I would appreciate your help for any explanation on this.
Thank you so much!!!
Hi Michael,
Thank you for your reply. I did take a look at plotCounts(). Here is the plot for gene A: https://ibb.co/xD0Y7Hv
Here is the plot for gene B: https://ibb.co/KqGvqC9
I still don't get it. To me, gene A should have smaller p value.
Thanks for your time!!!
Hard to say. In the end I focus on FDR sets and LFC rather than pvalues (see DESeq2 paper or apeglm paper for discussion). So I’m not concerned very much with tiny vs very very tiny pvalue.
Hi Michael,
Thanks again for your reply. The FDR for gene A is still bigger than gene B, of course. I agree with you that as long as both genes are significant after FDR, it's no big deal. But I was still wondering if there is an explanation for this discrepancy. When we showed the plotCounts figures and the p value for these two genes in the manuscript, reviewers questioned our analysis.
Thanks so much for your time.
There are many aspects that go into the SE for an LFC which is what drives the Wald test. The level of the count for both groups and the within group dispersion are factors. The gene with the smaller pvalue has lower dispersion I think. If you used a LRT the pvalues may be closer to each other.
Hi Michael,
Thank you for your reply again.
I tried LRT and the p values for the two genes did not become further. For gene A, the p value is 0.0064. For gene B, the p value is 2.55E-29. Are there other parameters I could set when I run DESeq so that I can get the same level of p value for these two genes?
No I don’t think so. It is what it is.