Hi Bioconductor Community,
I would like to confirm that I have constructed my design matrix appropriately and am testing the correct coefficients/contrasts given my biological questions.
My experiment is similar to Section 3.5 "Comparisons both between and within subjects" of the edgeR user guide. I found a related question asked here and so have set up my design matrix based on the answers from this post to avoid the use of interaction terms (which I find confusing). This post is also related, as we have very similar research questions.
Experimental design and design matrix
Here is a reproducible example of my experimental design and design matrix. Note that the Control
and Diseased
patients are different individuals.
(targets <-
data.frame(Disease = c(rep("Control", 6), rep("Diseased", 6)),
Patient = c(paste0("C", (gl(3, 2, 6))),
paste0("D", (gl(3, 2, 6)))),
Tissue = rep(c("Blood", "Bone"), 6)))
Disease Patient Tissue
1 Control C1 Blood
2 Control C1 Bone
3 Control C2 Blood
4 Control C2 Bone
5 Control C3 Blood
6 Control C3 Bone
7 Diseased D1 Blood
8 Diseased D1 Bone
9 Diseased D2 Blood
10 Diseased D2 Bone
11 Diseased D3 Blood
12 Diseased D3 Bone
Patient <- targets$Patient
Disease <- targets$Disease
Tissue <- targets$Tissue
Group <- paste0(Disease, ".", Tissue)
design <- model.matrix(~Patient + Group)
# Get to full rank
design <- design[,!grepl("Blood$", colnames(design))]
colnames(design)
[1] "(Intercept)" "PatientC2" "PatientC3" "PatientD1" "PatientD2" "PatientD3"
[7] "GroupControl.Bone" "GroupDiseased.Bone"
Biological questions
I have three main biological questions. Am I using the correct approaches to answer them?
Q1) To find genes differentially expressed (DE) between Control Bone vs. Control Blood:
glmLRT(fit, coef = "GroupControl.Bone")
Q2) To find genes DE between Diseased Bone vs. Diseased Blood:
glmLRT(fit, coef = "GroupDiseased.Bone")
Q3) And most importantly, to find genes DE between Diseased Bone vs. Diseased Blood (as per Q2), but ensure they are unique to the diseased condition (i.e. not DE in Control Bone vs. Control Blood as per Q1):
Find the intersect (e.g. using dplyr::full_join
) of the DE gene lists from Q1 and Q2 (after filtering each list based on FDR and logFC) to determine which genes are common and unique to each group.
Or is there a better approach to answer Q3? Do I need to use limma::duplicateCorrelation
? As I don't believe this contrast answers Q3:
myContrast <- makeContrasts(GroupDiseased.Bone - GroupControl.Bone, levels = design)
Rather, I believe it tests whether the differences between Bone and Blood are the same between Diseased and Control patients, which is a slightly different question.
Your help and advice would be greatly appreciated!
Many thanks, Rebecca
Excellent, thank you so much for your help Gordon!
As a follow up question for my understanding: Given my approaches to answering Q1 and Q2 above are correct, how does this work? Because here, the intercept term represents
PatientC1
, yes? Yet if we specify the coefficient "GroupControl.Bone" inglmLRT
it compares to "GroupControl.Blood" even though this is not the intercept term. And the same is true when specifying the coefficient "GroupDisease.Bone", it compares to "GroupDisease.Blood". Is this just a feature of additive models that I have to remember? Or how can I check what each coefficient represents, given each group doesn't have it's own coefficient?PS. Yes I should have explained how I would use
full_join
in this context, see below. I get the same result as your approach, it's just a longer way to get there. I definitely prefer your way!You're asking me to explain your own design matrix? Including
Patient
in the linear model adjusts for the baseline level of al the patients. It makes no difference which one is the intercept. Indeed the results would be unchanged if you removed the intercept altogether by0+Patient+Group
.Basically, you are simply doing two paired comparisons, one for the Control treatment and one for Disease treatment. The design matrix is the two paired-comparison design matrices merged together.
My apologies, I don't think I'm explaining my question very well. I am used to constructing very simple design matrices like
~0 + Group
where "Group" is one factor with two or more levels, so all the groups have their own coefficients that can easily be accessed usingmakeContrasts
. But in my case, where "Group" is a combination of two factors, not all groups have their own coefficient (i.e. coefficients for"GroupControl.Blood"
and"GroupDiseased.Blood"
don't exist). So it isn't clear to me how the last two coefficients ("GroupControl.Bone" and "GroupDiseased.Bone") represent the logFC of Control Bone over Control Blood, and logFC of Diseased Bone over Diseased Blood, respectively. I came across the explanation of the coefficients by reading other posts on Bioconductor, but I don't understand how it works.I think this is close to the answer I'm seeking:
If it's too difficult to explain more than this, could you please suggest some further reading? Thanks again for your help, I really appreciate your time.