Question

Design and Contrast Matrix for Linear Model. To Combine or not to Combine?

1

Entering edit mode

mikhael.manurung ▴ 280

@mikhaelmanurung-17423

Last seen 3.5 years ago

Netherlands

Dear all,

I need your advice in designing contrast and design matrices for use in limma.

My study consists of 2 groups that are followed longitudinally at three time points (repeated measures):

Group has 2 factors, i.e. groupA and groupB.
Timepoint has 3 factors, i.e. t0, t1, and t2.

After browsing the user guide and the Bioc forum, I understand that I can combine my Group and Timepoint variables into one variable:

group_time <- paste(Group, Timepoint, sep="_")

Thus, for the group_time variable, I have 6 factors: groupA_t0, groupA_t1, groupA_t2, groupB_t0, and so on. And then I make my design matrix in the following way:

design1 <- model.matrix(~ 0+ group_time)

I am interested to see the changes between the time points within each of the groups. Therefore, I formulate my contrast matrix in such a way:

t0t1_inGroupA = groupA_t0 - groupAt1,
t0t2_inGroupA = groupA_t0 - groupAt2,
t0t1_inGroupB = groupB_t0 - groupBt1,
t0t2_inGroupB = groupB_t0 - groupBt2,
levels=design

I really love this way because of its sheer simplicity!

I showed this method of making design and contrast matrices to a statistician colleague and he insisted to alter my design matrix to:

design2 <- model.matrix(~ Group * Timepoint).

He stated that design2 is way better because now it incorporates an interaction term between my Group and Timepoint variable. But then now I am having a hard time to create a contrast matrix to do the same comparison as the above.

My question is, would there be any difference between the results? From what I understand, contrast matrix is simply a way to 'switch on and off' which samples that I want to compare. For now, I cannot really know because I still haven't succeeded in making the appropriate contrast matrix for the new design matrix.

Thank you for your advice.

Best regards,

Mikhael

limma design and contrast matrix design matrix contrast matrix • 2.1k views

ADD COMMENT • link updated 7.2 years ago by James W. MacDonald 68k • written 7.2 years ago by mikhael.manurung ▴ 280

score 3 · Answer 1 · 2018-09-28

3

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 15 hours ago

United States

You can still get any interaction you might care about using the first parameterization. For example, if you want to know the genes that have an interaction between time and group between time 0 and time 1, it's just

makeContrasts(int01 = (groupA_t0 - groupAt1) - (groupB_t0 - groupBt1), levels = design)

And analogously for the changes between e.g., time 0 and time 2 or time 1 and time 2. What your statistician friend neglected to note is that the other contrasts you are interested in are more difficult to extract using his parameterization as compared to yours, whereas adding in the interactions isn't difficult with your parameterization.

ADD COMMENT • link 7.2 years ago James W. MacDonald 68k

1

Entering edit mode

This is covered to some extent by Section 9.5 of the limma user's guide:

All the approaches considered are equivalent and yield identical bottom-line results.

... with the minor caveat that some results change slightly when you have weights, due to an approximation in contrasts.fit - see the Note in ?contrasts.fit. But James is right, your original single-factor design is much easier to work with. The two-factor model is the classical formulation that gets taught in statistics classes, but is not particularly useful here.

ADD REPLY • link 7.2 years ago Aaron Lun ★ 29k

0

Entering edit mode

I really appreciate your responses! If you don't mind, I would like to further test my understanding of the contrast matrix. Suppose that I want to do an Anova-like tests for all time points of observation for group A (i.e. overall DEGs for group A). Then, is it correct if I design my contrast as such:

makeContrasts(aovA = groupA_t0 + groupA_t1 + groupA_t2)/3, levels=design)

Thank you for taking the time.

ADD REPLY • link 7.2 years ago mikhael.manurung ▴ 280

0

Entering edit mode

No, that would test the hypothesis that the average of the groupA data is equal to zero. Which would be rejected for most of the genes.

Remember that the coefficients in your parameterization are simply computing the mean of each group, which isn't really of interest (it's just a highly processed number that doesn't really mean much). It's the differences between groups that you care about. So if you wanted to test the hypothesis that there are no differences between any time points for groupA, using the contrasts matrix you specified above, and assuming your MArrayLM object is called 'fit2' (because what else would you call it - you've read the limma User's Guide, no?).

topTable(fit2, 1:2)

Will do an F-test using contrasts 1 and 2.

ADD REPLY • link 7.2 years ago James W. MacDonald 68k

0

Entering edit mode

Speaking of F-test, I actually asked another question in this forum: nestedF in decideTests: an analogue of Anova with post-hoc t-tests?

Briefly, would the above command yield the same result as using decideTests with "method = nestedF" argument?

ADD REPLY • link 7.2 years ago mikhael.manurung ▴ 280