All Pairwise Comparisons with DESeq2 While Controlling for Batch Effects: Does Reference Level Matter?
1
1
Entering edit mode
wunderl ▴ 30
@wunderl-20805
Last seen 2.1 years ago

I have the following design matrix:

batch treatment
A         1
A         2
A         3
B         1
B         2
B         3
C         1
C         2
C         3

> batch
[1] A A A B B B C C C
Levels: A B C

> treatment
[1] 1 2 3 1 2 3 1 2 3
Levels: 1 2 3


I would like to determine all of the pairwise differences between the three treatments while controlling for batch. There is no "control/baseline" level for either batch or treatment here, so there is no clear choice for the reference levels. So far I have been modeling this using the design ~ batch + treatment. I realize using the default approach in DESeq2 this produces the following:

#model matrix
(Intercept) batchB batchC treatment2 treatment3
1      0      0          0          0
1      0      0          1          0
1      0      0          0          1
1      1      0          0          0
1      1      0          1          0
1      1      0          0          1
1      0      1          0          0
1      0      1          1          0
1      0      1          0          1

#resultNames(dds)
[1] "Intercept"             "batch_B_vs_A"     "batch_C_vs_A"
[4] "treatment_2_vs_1" "treatment_3_vs_1"

#Missing comparison
results(dds, contrast = c("treatment", "2", "3"))


My concern with this is that both the reference level for batch and for treatment do not get coefficients in the resulting model but are instead convolved together in the intercept term. I am wondering if this would make the "missing" missing pairwise comparison inaccurate or different from the other two explicitly fit by the model. It also prevents running lfcShrink for the missing comparison when using apeglm.

I am wondering if the following expanded/full model matrix is the appropriate way to handle this situation:

(Intercept) batchA batchB batchC treatment1 treatment2 treatment3
1      1      0      0          1          0          0
1      1      0      0          0          1          0
1      1      0      0          0          0          1
1      0      1      0          1          0          0
1      0      1      0          0          1          0
1      0      1      0          0          0          1
1      0      0      1          1          0          0
1      0      0      1          0          1          0
1      0      0      1          0          0          1


This would theoretically give me coefficients for all 3 levels of both batch and treatment, and would allow me to run lfcShrink for all 3 possible pairwise comparisons between treatments. This would make the intercept term effectively 0 however, and would also remove the reference against with the fold changes are being determined. This all seems problematic.

Is this an appropriate approach, or is there another alternative approach I should be using?

0
Entering edit mode
@mikelove
Last seen 3 hours ago
United States

It doesn't make a difference numerically, but it just takes one more step to run lfcShrink. If you look up in the vignette, we have some simple code to produce e.g. the 3 vs 2 effect as one of the coefficients in resultsNames(dds).