Hello,
I was hoping someone could help me determine which of the following contrast formulations is correct. Like others, I am getting confused about what to specify when one of the treatments being contrasted is the reference level.
Here is an example which aligns with my real dataset. In this case, I did RNASeq on samples collected from different Donors, each of which was given a different treatment. I am interested in the contrasts between each pair of treatments, while blocking for Donor. Specifically, I want to test for DEGs for B vs. A, C vs. A, and C vs. B.
> eg = data.frame(Donor = rep(letters[1:5], each = 3),
+ Trt = rep(LETTERS[1:3], 5))
> eg
Donor Trt
1 a A
2 a B
3 a C
4 b A
5 b B
6 b C
7 c A
8 c B
9 c C
10 d A
11 d B
12 d C
13 e A
14 e B
15 e C
In the edgeR manual, it suggests putting blocking variables first, and treatment variables second, into the design matrix. The first two attempts at building a design matrix and contrasts below follow this paradigm. They differ by the exclusion of the intercept. While I think I understand how the intercept (or not) works when there are only treatment variables, I realize I am not so sure how it works when there are also blocking variables.
It is clearer to me to do the opposite, however, and also provide no intercept, so that I can specify A, B, and C directly in the contrasts. So, assuming it is doing what I think it is doing, the third option below is my preferred formulation.
I'd very much appreciate insight into which formula(e) and contrasts are testing what I'm looking for. Thanks!
Try 1:
> #try 1, no intercept, blocking variable first
> design1 = model.matrix(~0 + Donor + Trt, data = eg)
> colnames(design1)
[1] "Donora" "Donorb" "Donorc" "Donord" "Donore" "TrtB" "TrtC"
> makeContrasts(B_A = TrtB,
+ C_A = TrtC,
+ C_B = TrtC - TrtB, levels = design1)
Contrasts
Levels B_A C_A C_B
Donora 0 0 0
Donorb 0 0 0
Donorc 0 0 0
Donord 0 0 0
Donore 0 0 0
TrtB 1 0 -1
TrtC 0 1 1
Try 2:
> # try 2, intercept, blocking variable first
> design2 = model.matrix(~Donor + Trt, data = eg)
> colnames(design2)
[1] "(Intercept)" "Donorb" "Donorc" "Donord" "Donore" "TrtB" "TrtC"
> makeContrasts(B_A = TrtB,
+ C_A = TrtC,
+ C_B = TrtC - TrtB, levels = design2)
Contrasts
Levels B_A C_A C_B
Intercept 0 0 0
Donorb 0 0 0
Donorc 0 0 0
Donord 0 0 0
Donore 0 0 0
TrtB 1 0 -1
TrtC 0 1 1
Warning message:
In makeContrasts(B_A = TrtB, C_A = TrtC, C_B = TrtC - TrtB, levels = design2) :
Renaming (Intercept) to Intercept
Try 3:
> # try 3, no intercept, treatment variable first
> design3 = model.matrix(~0 + Trt + Donor, data = eg)
> colnames(design3)
[1] "TrtA" "TrtB" "TrtC" "Donorb" "Donorc" "Donord" "Donore"
> makeContrasts(B_A = TrtB - TrtA,
+ C_A = TrtC - TrtA,
+ C_B = TrtC - TrtB, levels = design3)
Contrasts
Levels B_A C_A C_B
TrtA -1 -1 0
TrtB 1 0 -1
TrtC 0 1 1
Donorb 0 0 0
Donorc 0 0 0
Donord 0 0 0
Donore 0 0 0
Thanks! Great, that's what I was hoping.