Hi, I have a question about contrasts for more complex GLM models in edgeR.
Taking the section 3.5 example as a starting point
> targets <- data.frame(Disease = factor(rep(c("Healthy","Disease1","Disease2"), each = 6)),
+ Patient = factor(rep(c(1:9), each=2)),
+ Treatment = factor(rep(c("None","Hormone"), 9)))
> Patient <- gl(3,2,length = 18)
> Disease <- factor(targets$Disease, levels = c("Healthy","Disease1","Disease2"))
> Treatment <- factor(targets$Treatment, levels = c("None","Hormone"))
> data.frame(Disease, Patient, Treatment)
Disease Patient Treatment
1 Healthy 1 None
2 Healthy 1 Hormone
3 Healthy 2 None
4 Healthy 2 Hormone
5 Healthy 3 None
6 Healthy 3 Hormone
7 Disease1 1 None
8 Disease1 1 Hormone
9 Disease1 2 None
10 Disease1 2 Hormone
11 Disease1 3 None
12 Disease1 3 Hormone
13 Disease2 1 None
14 Disease2 1 Hormone
15 Disease2 2 None
16 Disease2 2 Hormone
17 Disease2 3 None
18 Disease2 3 Hormone
> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment)
> colnames(design)
[1] "(Intercept)" "DiseaseDisease1" "DiseaseDisease2" "DiseaseHealthy:Patient2" "DiseaseDisease1:Patient2"
[6] "DiseaseDisease2:Patient2" "DiseaseHealthy:Patient3" "DiseaseDisease1:Patient3" "DiseaseDisease2:Patient3" "DiseaseHealthy:TreatmentHormone"
[11] "DiseaseDisease1:TreatmentHormone" "DiseaseDisease2:TreatmentHormone"
My question is how to configure the contrasts/coefficients chosen to find the equivalent of this:
(DiseaseDisease1:TreatmentHormone - DiseaseDisease1:TreatmentNone) - (DiseaseHealthy:TreatmentHormone - DiseaseHealthy:TreatmentNone)
i.e. those genes where the change of expression induced by the hormone is different between Disease1 and healthy.
There is a line in the example which says it will find genes that respond differently to the hormone in disease1 vs healthy patients, so sounds as though it is perfect, however I am not familiar enough with GLMs to decide if it is doing what I want:
qlf <- glmQLFTest(fit,contrast = c(0,0,0,0,0,0,0,0,0,-1,1,0)
This would appear to be performing:
DiseaseDisease1:TreatmentHormone - DiseaseHealthy:TreatmentHormone
Rather than the difference in the response to the hormone, is it not just finding the difference in the treated Disease1 vs the treated healthy? So any changes between disease and healthy without the hormone are not accounted for - I suspect this is due to me not understanding what the intercept represents and how it is used in the contrasts.
Thanks in advance!!!
On a related note is there a good resource for learning how to fit GLMs - so I can understand how the coefficients work with and without intercepts etc? I have read the sections in the edgeR and Limma manuals which are very informative but I don't understand the concepts behind how the various models have been fit and contrasts/coefficients chosen.
Thanks for your very informative answer!
For future reference, do you know of a resource which further explores how to deduce the meanings of the coefficients, such that when planning future experiments I do not need to ask for further clarification?
Kind regards and many thanks,
Dean
Then you should refer to this link for in-depth reading on contrast matrix.
There isn't a good reference at the level you're looking for, which is why we try so hard to explain things explicitly in the limma and edgeR User Guides. If you don't understand at all the role of intercepts in R model formulae, then it would be best to start with Section 9.2 in the limma User's Guide rather than a very complex design like the one above. The same principle used in Section 9.2 is applied over and over again in complex designs. The edgeR Section 3.5 formula is quite advanced and quite specialized and I've never seen a statistical textbook discuss anything like it.
In general, we recommend you use the group-means parametrization that has explicit contrast interpretations. We pushed this approach in limma and edgeR because it is otherwise so easy to misinterpret model formula in R. Again, though, it is not something that is discussed in statistical textbooks.