Understanding DESeq2 contrasts and p-values
1
2
Entering edit mode
laura.baxter ▴ 30
@ceb3696b
Last seen 3.4 years ago

Scenario: I have a dataset with 3 replicates for each of 12 treatments (say A...L), and I am interested in identifying DEGs occurring between particular treatment pairs (e.g. A vs B).

Observation: If I create the dds from the whole dataset and choose a specific contrast, I get different baseMean, lfcSE, pvalue and padj values than I do if I create the dds using only data from the pair of interest.

CONTRAST SPECIFIED FROM WITHIN DDS OF ALL TREATMENTS:

results(dds, contrast=c("condition","A","B"))
log2 fold change (MLE): condition A vs B 
Wald test p-value: condition A vs B 
DataFrame with 19189 rows and 6 columns
            baseMean log2FoldChange     lfcSE      stat      pvalue      padj
           <numeric>      <numeric> <numeric> <numeric>   <numeric> <numeric>
ATCG00020 5873.22144       2.926121  1.021244   2.86525 0.004166788  0.402878
ATCG00040   55.49366       2.556820  0.772141   3.31134 0.000928506  0.321590
ATCG00050    4.24306       0.911677  1.097440   0.83073 0.406125929  0.895508
ATCG00070  129.27108       1.041406  0.570771   1.82456 0.068067257  0.722884
ATCG00080  282.81681       1.066214  0.789473   1.35054 0.176842976  0.811457

...

RESULT FROM DATA INPUT ONLY FOR TREATMENT A and B:

results(AB_dds)
log2 fold change (MLE): condition A vs B 
Wald test p-value: condition A vs B 
DataFrame with 17890 rows and 6 columns
           baseMean log2FoldChange     lfcSE      stat      pvalue        padj
          <numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric>
ATCG00020 4869.7243        2.92651  0.331037   8.84042 9.53567e-19 1.44065e-14
ATCG00040   65.4715        2.55531  0.572043   4.46698 7.93303e-06 2.66338e-03
ATCG00070  134.0186        1.03323  0.320381   3.22499 1.25976e-03 1.00171e-01
ATCG00080  313.3158        1.06132  0.373712   2.83994 4.51217e-03 2.24243e-01
ATCG00120  224.4218        2.09119  0.363608   5.75122 8.86001e-09 6.37414e-06

...

I'm slightly surprised by this difference.

In general, when looking at a particular contrast, what effect are the other samples outside of the contrast having on the results, and why?

DESeq2 • 1.7k views
ADD COMMENT
1
Entering edit mode

Replying to my own question! I think this part of the FAQ explains why, in our case, adding more groups (including all conditions) makes the model worse by inflating the per-gene dispersion estimate. The PCA plot indeed shows groups with high within-group variability. http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#if-i-have-multiple-groups-should-i-run-all-together-or-split-into-pairs-of-groups

ADD REPLY
1
Entering edit mode

Hi. Precisely your PCA plot is the guiding plot whether to split by group or take all together. I too had a similar discussion here:Strategy to perform DESeq, DEXSeq workflow

Hope this will help. If you have a lot of intrasample variability (biological replicates differ drastically), you should split by groups.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 13 hours ago
United States

Marking this as answered...

ADD COMMENT

Login before adding your answer.

Traffic: 932 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6