I'm analyzing an RNA-Seq dataset with three different factors: genotype, treatment, and time. I need to do a pairwise comparison of each time to time zero for every genotype and treatment combination. A simplified version of my coldata looks like this:
> coldata geno treat time 1 A 1 0 2 A 1 1 3 A 1 4 4 A 1 24 5 B 1 0 6 B 1 1 7 B 1 4 8 B 1 24 9 C 1 0 10 C 1 1 11 C 1 4 12 C 1 24 13 A 2 0 14 A 2 1 15 A 2 4 16 A 2 24 17 B 2 0 18 B 2 1 19 B 2 4 20 B 2 24 21 C 2 0 22 C 2 1 23 C 2 4 24 C 2 24
So, for example, I need to compare genoA/treat1/time1 vs. genoA/treat1/time0, genoA/treat1/time4 vs. genoA/treat1/time0, and genoA/treat1/time24 vs. genoA/treat1/time0, and so on for each geno/treat combination.
I've done a lot of reading, and it appears that the easiest way to do pairwise comparisons for sub-groups is to create a new column in
coldata pasting together two or more of the factors together, so each combination of factors is treated as one separate group (here I have named this new column
multi, and the first entry is
I tried this, and felt like there were 2 main problems:
This method doesn't maintain the relationships between the original factors (i.e. as far as the model knows, genoA/treat1/time0 should be no more similar to genoA/treat1/time1 than to genoB/treat2/time24; they're all just separate groups). Does this matter?
If I want to do multiple comparisons with different denominators, I have to use
results()so I can manually enter a contrast, but that means I cannot do
lfcShrink(). For example,
results(dds, alpha = 0.05, contrast = c("multi", "treat1.genoA.1", "treat1.genoA.0"))
works fine for all comparisons, and I can do all needed comparisons after running the model (
However, if I want to use
lfcShrink() (which I do, as it seems to be the main reason to choose DESeq2 over other methods), it will only take as a
contrast the contrasts specified in
resultsNames(dds), so the denominator is determined by the factor levels of
multi (by default
treat1.genoA.0). So to run the comparisons with another denominator, such as
treat1.genoB.0, I have to relevel
coldata$multi, which then requires re-running
DESeq2(dds). This is both computationally time-consuming, and it also doesn't feel like best practice to be re-running the model over and over like this.
Is there an easier way to do these comparisons that I am missing?
Thank you! ashr is working well for what I need.