Dear all,
I need your advice in designing contrast and design matrices for use in limma.
My study consists of 2 groups that are followed longitudinally at three time points (repeated measures):
- Group has 2 factors, i.e. groupA and groupB.
- Timepoint has 3 factors, i.e. t0, t1, and t2.
After browsing the user guide and the Bioc forum, I understand that I can combine my Group and Timepoint variables into one variable:
group_time <- paste(Group, Timepoint, sep="_")
Thus, for the group_time variable, I have 6 factors: groupA_t0, groupA_t1, groupA_t2, groupB_t0, and so on. And then I make my design matrix in the following way:
design1 <- model.matrix(~ 0+ group_time)
I am interested to see the changes between the time points within each of the groups. Therefore, I formulate my contrast matrix in such a way:
t0t1_inGroupA = groupA_t0 - groupAt1,
t0t2_inGroupA = groupA_t0 - groupAt2,
t0t1_inGroupB = groupB_t0 - groupBt1,
t0t2_inGroupB = groupB_t0 - groupBt2,
levels=design
I really love this way because of its sheer simplicity!
I showed this method of making design and contrast matrices to a statistician colleague and he insisted to alter my design matrix to:
design2 <- model.matrix(~ Group * Timepoint).
He stated that design2 is way better because now it incorporates an interaction term between my Group and Timepoint variable. But then now I am having a hard time to create a contrast matrix to do the same comparison as the above.
My question is, would there be any difference between the results? From what I understand, contrast matrix is simply a way to 'switch on and off' which samples that I want to compare. For now, I cannot really know because I still haven't succeeded in making the appropriate contrast matrix for the new design matrix.
Thank you for your advice.
Best regards,
Mikhael
This is covered to some extent by Section 9.5 of the limma user's guide:
All the approaches considered are equivalent and yield identical bottom-line results.
... with the minor caveat that some results change slightly when you have weights, due to an approximation in
contrasts.fit
- see the Note in?contrasts.fit
. But James is right, your original single-factor design is much easier to work with. The two-factor model is the classical formulation that gets taught in statistics classes, but is not particularly useful here.I really appreciate your responses! If you don't mind, I would like to further test my understanding of the contrast matrix. Suppose that I want to do an Anova-like tests for all time points of observation for group A (i.e. overall DEGs for group A). Then, is it correct if I design my contrast as such:
Thank you for taking the time.
No, that would test the hypothesis that the average of the groupA data is equal to zero. Which would be rejected for most of the genes.
Remember that the coefficients in your parameterization are simply computing the mean of each group, which isn't really of interest (it's just a highly processed number that doesn't really mean much). It's the differences between groups that you care about. So if you wanted to test the hypothesis that there are no differences between any time points for groupA, using the contrasts matrix you specified above, and assuming your MArrayLM object is called 'fit2' (because what else would you call it - you've read the limma User's Guide, no?).
Will do an F-test using contrasts 1 and 2.
Speaking of F-test, I actually asked another question in this forum: nestedF in decideTests: an analogue of Anova with post-hoc t-tests?
Briefly, would the above command yield the same result as using decideTests with "method = nestedF" argument?