Question

Understanding DESeq2 contrasts and p-values

2

Entering edit mode

laura.baxter ▴ 30

@ceb3696b

Last seen 3.4 years ago

Scenario: I have a dataset with 3 replicates for each of 12 treatments (say A...L), and I am interested in identifying DEGs occurring between particular treatment pairs (e.g. A vs B).

Observation: If I create the dds from the whole dataset and choose a specific contrast, I get different baseMean, lfcSE, pvalue and padj values than I do if I create the dds using only data from the pair of interest.

CONTRAST SPECIFIED FROM WITHIN DDS OF ALL TREATMENTS:

results(dds, contrast=c("condition","A","B"))
log2 fold change (MLE): condition A vs B 
Wald test p-value: condition A vs B 
DataFrame with 19189 rows and 6 columns
            baseMean log2FoldChange     lfcSE      stat      pvalue      padj
           <numeric>      <numeric> <numeric> <numeric>   <numeric> <numeric>
ATCG00020 5873.22144       2.926121  1.021244   2.86525 0.004166788  0.402878
ATCG00040   55.49366       2.556820  0.772141   3.31134 0.000928506  0.321590
ATCG00050    4.24306       0.911677  1.097440   0.83073 0.406125929  0.895508
ATCG00070  129.27108       1.041406  0.570771   1.82456 0.068067257  0.722884
ATCG00080  282.81681       1.066214  0.789473   1.35054 0.176842976  0.811457

...

RESULT FROM DATA INPUT ONLY FOR TREATMENT A and B:

results(AB_dds)
log2 fold change (MLE): condition A vs B 
Wald test p-value: condition A vs B 
DataFrame with 17890 rows and 6 columns
           baseMean log2FoldChange     lfcSE      stat      pvalue        padj
          <numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric>
ATCG00020 4869.7243        2.92651  0.331037   8.84042 9.53567e-19 1.44065e-14
ATCG00040   65.4715        2.55531  0.572043   4.46698 7.93303e-06 2.66338e-03
ATCG00070  134.0186        1.03323  0.320381   3.22499 1.25976e-03 1.00171e-01
ATCG00080  313.3158        1.06132  0.373712   2.83994 4.51217e-03 2.24243e-01
ATCG00120  224.4218        2.09119  0.363608   5.75122 8.86001e-09 6.37414e-06

...

I'm slightly surprised by this difference.

In general, when looking at a particular contrast, what effect are the other samples outside of the contrast having on the results, and why?

DESeq2 • 1.7k views

ADD COMMENT • link updated 3.4 years ago by Michael Love 41k • written 3.4 years ago by laura.baxter ▴ 30

1

Entering edit mode

Replying to my own question! I think this part of the FAQ explains why, in our case, adding more groups (including all conditions) makes the model worse by inflating the per-gene dispersion estimate. The PCA plot indeed shows groups with high within-group variability. http://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#if-i-have-multiple-groups-should-i-run-all-together-or-split-into-pairs-of-groups

ADD REPLY • link 3.4 years ago laura.baxter ▴ 30

1

Entering edit mode

Hi. Precisely your PCA plot is the guiding plot whether to split by group or take all together. I too had a similar discussion here:Strategy to perform DESeq, DEXSeq workflow

Hope this will help. If you have a lot of intrasample variability (biological replicates differ drastically), you should split by groups.

ADD REPLY • link 3.4 years ago rohitsatyam102 ▴ 20

score 0 · Answer 1 · 2020-11-28

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 13 hours ago

United States

Marking this as answered...

ADD COMMENT • link 3.4 years ago Michael Love 41k