Question

non-significant p-values from DESeq2 for some genes, visually the data looks significant

0

Entering edit mode

RohitGarg • 0

@rohitgarg-21985

Last seen 4.6 years ago

I am getting non-significant p-values for several genes that "visually" look significant. Here is one such gene.

NAME DESCRIPTION BrainVEC1 BrainVEC2 BrainVEC3 BrainNVEC1 BrainNVEC2 BrainNVEC3 ChPVEC1 ChPVEC2 ChPVEC3 ChPVEC4 ChPVEC5 ChPVEC6 ChPNVEC1 ChPNVEC2 ChPNVEC3 ChPNVEC4 ChPNVEC5 ChPNVEC6 DuraVEC1 DuraVEC2 DuraVEC3 DuraNVEC1 DuraNVEC2 DuraNVEC3 DuraLEC1 DuraLEC2 DuraLEC3 DuraLEC4 DuraLEC5 PiaVEC1 PiaVEC2 PiaVEC3 PiaNVEC1 PiaNVEC2 PiaNVEC3 ParenchymaVEC1 ParenchymaVEC2 ParenchymaVEC3 ParenchymaNVEC1 ParenchymaNVEC2 ParenchymaNVEC3 ENSMUSG00000026582 Sele 21.283583 29.655133 270.802420 4.066718 4.215113 37.817222 472.788450 149.998929 113.672655 247.821995 186.434232 495.219900 139.739530 6.182646 12.128706 8.289135 43.592426 46.515864 4786.500353 2903.643008 3419.749856 1123.774279 376.820845 645.502343 273.998142 113.368945 101.752742 1741.593662 1801.201691 165.734591 402.232505 418.380663 18.586756 2.654553 5.152745 112.342769 8.485167 2418.136018 2.525383 3.192719 1.239111

Visually the gene looks significant, but when I do a DE contrasting DuraVEC (data in bold above) vs. DURANVEC(bold italic) I get the following result:

Gene,BaseMean,Log2FC,LfcSE,Stat,Pvalue,Padj Sele,564.212036,2.372231,1.203575,1.970987,4.872537e-02,2.005588e-01

Any help is appreciated. Thank You!

deseq2 • 698 views

ADD COMMENT • link updated 4.6 years ago by Michael Love 41k • written 4.6 years ago by RohitGarg • 0

score 0 · Answer 1 · 2019-09-25

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 hour ago

United States

The pvalue here (.04) seems to me to reflect that there is some.evidence against the null but there is also only 3 samples per group and some moderate within group variance.

ADD COMMENT • link 4.6 years ago Michael Love 41k

0

Entering edit mode

Hi Michael, Most of our data has 3 biological replicates with 3 technical replicates each. Some date has up to six replicates with no technicals. The technicals are collapsed prior to normalization as per DESeq2. What do you mean by "evidence against the null"? Thanks!

ADD REPLY • link 4.6 years ago RohitGarg • 0

0

Entering edit mode

Take a look at the DESeq2 paper for full details. In brief, we compute a p-value which evaluates the probability of seeing a test statistic as large or larger if LFC=0. This particular gene and contrast gives .04 which is low. It's just not low enough. A big factor here is the variability of this gene and borrowing information from other genes. Also n=3 means that you need the differences to be much more than the variability within groups.

Here's some really simple and naive computation, but just to give an idea, the SD of the counts in each group is ~400 and ~1000. The difference in mean between the two groups is ~700. So the difference is on the scale of the SD (here, really simple and just looking at counts). Another way to think about it is to presume the observed effect size of ~1 SD is real and not due to the null. A t-test has 16% power to detect a difference of 1 SD with n=3 vs 3. You actually need n=9 to get above 50% power.

ADD REPLY • link 4.6 years ago Michael Love 41k

0

Entering edit mode

Ok got it. The t-test gives only a marginally better p-value of 0.022. Thank you for your help!

ADD REPLY • link 4.6 years ago RohitGarg • 0