I'm analyzing an RNA-Seq dataset with three different factors: genotype, treatment, and time. I need to do a pairwise comparison of each time to time zero for every genotype and treatment combination. A simplified version of my coldata looks like this:
> coldata
geno treat time
1 A 1 0
2 A 1 1
3 A 1 4
4 A 1 24
5 B 1 0
6 B 1 1
7 B 1 4
8 B 1 24
9 C 1 0
10 C 1 1
11 C 1 4
12 C 1 24
13 A 2 0
14 A 2 1
15 A 2 4
16 A 2 24
17 B 2 0
18 B 2 1
19 B 2 4
20 B 2 24
21 C 2 0
22 C 2 1
23 C 2 4
24 C 2 24
So, for example, I need to compare genoA/treat1/time1 vs. genoA/treat1/time0, genoA/treat1/time4 vs. genoA/treat1/time0, and genoA/treat1/time24 vs. genoA/treat1/time0, and so on for each geno/treat combination.
I've done a lot of reading, and it appears that the easiest way to do pairwise comparisons for sub-groups is to create a new column in coldata
pasting together two or more of the factors together, so each combination of factors is treated as one separate group (here I have named this new column multi
, and the first entry is genoA.treat1.0
).
I tried this, and felt like there were 2 main problems:
This method doesn't maintain the relationships between the original factors (i.e. as far as the model knows, genoA/treat1/time0 should be no more similar to genoA/treat1/time1 than to genoB/treat2/time24; they're all just separate groups). Does this matter?
If I want to do multiple comparisons with different denominators, I have to use
results()
so I can manually enter a contrast, but that means I cannot dolfcShrink()
. For example,
results(dds, alpha = 0.05, contrast = c("multi", "treat1.genoA.1", "treat1.genoA.0"))
works fine for all comparisons, and I can do all needed comparisons after running the model (DESeq(dds)
) once.
However, if I want to use lfcShrink()
(which I do, as it seems to be the main reason to choose DESeq2 over other methods), it will only take as a coef
or contrast
the contrasts specified in resultsNames(dds)
, so the denominator is determined by the factor levels of multi
(by default treat1.genoA.0
). So to run the comparisons with another denominator, such as treat1.genoB.0
, I have to relevel coldata$multi
, which then requires re-running DESeq2(dds)
. This is both computationally time-consuming, and it also doesn't feel like best practice to be re-running the model over and over like this.
Is there an easier way to do these comparisons that I am missing?
Thanks, Andria
Thank you! ashr is working well for what I need.