Question

Unexpected p-values by DESeq2

0

Entering edit mode

xianglongruoying • 0

@xianglongruoying-20434

Last seen 5.0 years ago

Dear Community,

I have a question about the p-values reported by DESeq2. I performed differential expression analysis between cases and controls. The raw p-value reported for gene A with the following FPKM values is only 1.95E-04: case group: 1.95, 1.84, 0, 0, 0, 0, 0.01, 0.28, 0, 0.01, 0, 0, 0 control group: 63.23, 81.76, 75.39, 57.81, 44.48, 67.62, 51.98, 38.09, 80.06, 46.84, 90.77, 81.71, 64.62, 74.59

But the raw p-value for gene B with the following FPKM values is 9.97E-30, which is much more significant: case group: 19.33, 28.04, 23.6, 24.74, 23.5, 24.75, 17.92, 23.05, 16.72, 22.5, 25.94, 19.36, 20.3 control group: 38.71, 37.73, 36.04, 36.44, 53.25, 35.3, 34.58, 33.22, 46.12, 34.23, 43.95, 38.55, 35.11, 44.82

I know DESeq2 takes raw read count as input and I did use read counts for differential expression analysis. However, the normalized count by DESeq2 for these two genes follow the same pattern as the FPKM values.

What I don't understand is gene A should have much more significant p-value than gene B as gene A has almost no expression in cases, but apparently DESeq2 didn't report this way. Using shrinkage or not doesn't seem to matter as I tried both.

I would appreciate your help for any explanation on this.

Thank you so much!!!

deseq2 normalization • 685 views

ADD COMMENT • link updated 5.0 years ago by Michael Love 41k • written 5.0 years ago by xianglongruoying • 0

score 0 · Answer 1 · 2019-04-06

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

Take a look at plotCounts() for these genes. This may help you visualize the results.

ADD COMMENT • link 5.0 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

Thank you for your reply. I did take a look at plotCounts(). Here is the plot for gene A: https://ibb.co/xD0Y7Hv

Here is the plot for gene B: https://ibb.co/KqGvqC9

I still don't get it. To me, gene A should have smaller p value.

Thanks for your time!!!

ADD REPLY • link 5.0 years ago xianglongruoying • 0

0

Entering edit mode

Hard to say. In the end I focus on FDR sets and LFC rather than pvalues (see DESeq2 paper or apeglm paper for discussion). So I’m not concerned very much with tiny vs very very tiny pvalue.

ADD REPLY • link 5.0 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

Thanks again for your reply. The FDR for gene A is still bigger than gene B, of course. I agree with you that as long as both genes are significant after FDR, it's no big deal. But I was still wondering if there is an explanation for this discrepancy. When we showed the plotCounts figures and the p value for these two genes in the manuscript, reviewers questioned our analysis.

Thanks so much for your time.

ADD REPLY • link 5.0 years ago xianglongruoying • 0

0

Entering edit mode

There are many aspects that go into the SE for an LFC which is what drives the Wald test. The level of the count for both groups and the within group dispersion are factors. The gene with the smaller pvalue has lower dispersion I think. If you used a LRT the pvalues may be closer to each other.

ADD REPLY • link 5.0 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

Thank you for your reply again.

I tried LRT and the p values for the two genes did not become further. For gene A, the p value is 0.0064. For gene B, the p value is 2.55E-29. Are there other parameters I could set when I run DESeq so that I can get the same level of p value for these two genes?