Question

DESeq2 pairwise comparisons w/ 3 factors

1

Entering edit mode

aharkey ▴ 10

@aharkey-23107

Last seen 5.9 years ago

I'm analyzing an RNA-Seq dataset with three different factors: genotype, treatment, and time. I need to do a pairwise comparison of each time to time zero for every genotype and treatment combination. A simplified version of my coldata looks like this:

    > coldata
    geno  treat time
1     A    1    0
2     A    1    1
3     A    1    4
4     A    1   24
5     B    1    0
6     B    1    1
7     B    1    4
8     B    1   24
9     C    1    0
10    C    1    1
11    C    1    4
12    C    1   24
13    A    2    0
14    A    2    1
15    A    2    4
16    A    2   24
17    B    2    0
18    B    2    1
19    B    2    4
20    B    2   24
21    C    2    0
22    C    2    1
23    C    2    4
24    C    2   24

So, for example, I need to compare genoA/treat1/time1 vs. genoA/treat1/time0, genoA/treat1/time4 vs. genoA/treat1/time0, and genoA/treat1/time24 vs. genoA/treat1/time0, and so on for each geno/treat combination.

I've done a lot of reading, and it appears that the easiest way to do pairwise comparisons for sub-groups is to create a new column in coldata pasting together two or more of the factors together, so each combination of factors is treated as one separate group (here I have named this new column multi, and the first entry is genoA.treat1.0).

I tried this, and felt like there were 2 main problems:

This method doesn't maintain the relationships between the original factors (i.e. as far as the model knows, genoA/treat1/time0 should be no more similar to genoA/treat1/time1 than to genoB/treat2/time24; they're all just separate groups). Does this matter?
If I want to do multiple comparisons with different denominators, I have to use results() so I can manually enter a contrast, but that means I cannot do lfcShrink(). For example,

results(dds, alpha = 0.05, contrast = c("multi", "treat1.genoA.1", "treat1.genoA.0"))

works fine for all comparisons, and I can do all needed comparisons after running the model (DESeq(dds)) once.

However, if I want to use lfcShrink() (which I do, as it seems to be the main reason to choose DESeq2 over other methods), it will only take as a coef or contrast the contrasts specified in resultsNames(dds), so the denominator is determined by the factor levels of multi (by default treat1.genoA.0). So to run the comparisons with another denominator, such as treat1.genoB.0, I have to relevel coldata$multi, which then requires re-running DESeq2(dds). This is both computationally time-consuming, and it also doesn't feel like best practice to be re-running the model over and over like this.

Is there an easier way to do these comparisons that I am missing?

Thanks, Andria

deseq2 • 1.8k views

ADD COMMENT • link updated 5.9 years ago by Michael Love 43k • written 5.9 years ago by aharkey ▴ 10

score 4 · Accepted Answer · 2020-03-15

4

Entering edit mode

Michael Love 43k

@mikelove

Last seen 14 minutes ago

United States

Q1: no it doesn't matter

Q2: You can use type="ashr" with contrast, where you can specify whichever pair you want. Ashr performs very well for shrinkage of LFC for RNA-seq. (Also by the way, for type="apeglm", you don't need to re-run DESeq, only nbinomWaldTest to regenerate the MLE coefficients).