Question

design matrix for cross-over + repeated measures in limma?

0

Entering edit mode

Jenny Drnevich ★ 2.0k

@jenny-drnevich-2812

Last seen 11 months ago

United States

Hello,

I've been asked to analyze some microarray data from a fairly complex design involving a cross-over design with repeated measurements in each period. I've done a lot of searching on the internet, but haven't found much about the proper way to model this in regular R, let alone limma. The main effects of interest are diet (A and B) and an immune challenge (blood taken pre- and post-challenge). Each pig was subjected to one of the diets for 1 month, measured pre- and post-challenge, then allowed a wash-out period before being subjected to the other diet for a month and measured pre- and post-challenge. The targets file:

> targets <- data.frame(Sample = 1:32, Seq = rep(c("AB","BA"), each = 16),
+                       Diet = rep(c("A","B"), each = 8, 2),
+                       Time = rep(c("pre","post"), each = 4, 2),
+                       Period = rep(c(1,2,1), c(8,16,8)),
+                       Pig = c(rep(1:4,4), rep(5:8,4)))
>                      
> targets
   Sample Seq Diet Time Period Pig
1       1  AB    A  pre      1   1
2       2  AB    A  pre      1   2
3       3  AB    A  pre      1   3
4       4  AB    A  pre      1   4
5       5  AB    A post      1   1
6       6  AB    A post      1   2
7       7  AB    A post      1   3
8       8  AB    A post      1   4
9       9  AB    B  pre      2   1
10     10  AB    B  pre      2   2
11     11  AB    B  pre      2   3
12     12  AB    B  pre      2   4
13     13  AB    B post      2   1
14     14  AB    B post      2   2
15     15  AB    B post      2   3
16     16  AB    B post      2   4
17     17  BA    A  pre      2   5
18     18  BA    A  pre      2   6
19     19  BA    A  pre      2   7
20     20  BA    A  pre      2   8
21     21  BA    A post      2   5
22     22  BA    A post      2   6
23     23  BA    A post      2   7
24     24  BA    A post      2   8
25     25  BA    B  pre      1   5
26     26  BA    B  pre      1   6
27     27  BA    B  pre      1   7
28     28  BA    B  pre      1   8
29     29  BA    B post      1   5
30     30  BA    B post      1   6
31     31  BA    B post      1   7
32     32  BA    B post      1   8

The researchers are mainly interested in the interaction between diet and time, although they are also interested in the effects of Diet and Time (both main effects and pairwise within each level). I'm not sure how to specify the design matrix correctly for both the cross-over and the repeated measures. If there was just the post-challenge measurement, then I think the model for the simple cross-over design would be:

#subset samples
targ.post <- targets[targets$Time == "post",]

#make design matrix
d.cross.post <- model.matrix(~Period + Seq + Diet, data=targ.post)
colnames(d.cross.post)
#[1] "(Intercept)" "Period"      "SeqBA"       "DietB"

#Add pig in as random effect
cor.pig.post <- duplicateCorrelation(normValues[,targets$Time == "post"], 
                                       d.cross.post, block = targ.post$Pig)

#Fit model
fit.cross.post <- eBayes(lmFit(normValues[,targets$Time == "post"], d.cross.post, block = targ.post$Pig, cor = cor.pig.post$cor))

#Effect of interest, Diet B vs A:
topTable(fit.cross.post, coef = "DietB")

If I only took the first time period so it was a 2-way ANOVA with repeated measures on one factor, I could follow the suggestion in Section 3.5 of the edgeRUsersGuide() and model it this way:

#subset samples
targ.per1 <- targets[targets$Period == 1,]

#Re-number pigs within each diet:
targ.per1$Pig2 <- factor(rep(1:4,4))

#Re-level Time to have pre first:
targ.per1$Time <- relevel(targ.per1$Time, ref = "pre")

#make design matrix
d.rep.P1 <- model.matrix(~Diet*Time + Diet:Pig2, data=targ.per1)
colnames(d.rep.P1)
# [1] "(Intercept)"    "DietB"          "Timepost"       "DietB:Timepost"
# [5] "DietA:Pig22"    "DietB:Pig22"    "DietA:Pig23"    "DietB:Pig23"   
# [9] "DietA:Pig24"    "DietB:Pig24"

#Fit model:
fit.rep.P1 <- eBayes(lmFit(normValues[,targets$Period == 1], d.rep.P1))

#Follow Section 9.5.4 in limmaUsersGuide() to properly pull out my effects of interest...

What I can't figure out is how to properly put them together. Any help would be appreciated.

Thanks in advance,

Jenny

limma • 2.8k views

ADD COMMENT • link updated 10.2 years ago by Aaron Lun ★ 29k • written 10.2 years ago by Jenny Drnevich ★ 2.0k

score 3 · Answer 1 · 2015-10-07

You've got a pretty complex experimental setup, which makes me feel both envy and pity. Anyway; I think that once you establish diet-specific blocking factors for each pig, you no longer need to worry about the Period factor. Let's say we do something like this (some renaming and re-leveling, for aesthetics):

Pig <- factor(targets$Pig)
Diet <- factor(targets$Diet)
Time <- factor(targets$Time, levels=c("pre", "post"))
design <- model.matrix(~0 + Pig:Diet + Diet:Time)

Running colnames(design) gives us:

 [1] "Pig1:DietA"     "Pig2:DietA"     "Pig3:DietA"     "Pig4:DietA"    
 [5] "Pig5:DietA"     "Pig6:DietA"     "Pig7:DietA"     "Pig8:DietA"    
 [9] "Pig1:DietB"     "Pig2:DietB"     "Pig3:DietB"     "Pig4:DietB"    
[13] "Pig5:DietB"     "Pig6:DietB"     "Pig7:DietB"     "Pig8:DietB"    
[17] "DietA:Timepost" "DietB:Timepost"

The first 16 coefficients represent the pre-challenge expression for each pig/diet combination, while the last two represent the log-fold change induced by the challenge in each diet (i.e., the diet/time interaction of interest). There's no need to put in Period as a separate blocking factor, as it'll be fully absorbed by the various Pig:Diet terms. For example, running rowSum on columns 5 to 12 of design will yield a column equivalent to that from putting Period as an additive factor in the model construction.

If you want a bit more complexity, you could split the Diet:Time effects by period, i.e., define a separate term for the diet-specific post-challenge effect in each period. The design's construction is a bit messier, though:

Period <- factor(targets$Period)
design <- model.matrix(~0 + Pig:Diet + Diet:Time:Period)
design <- design[,!grepl("Timepre", colnames(design))] # get to full rank

Running colnames(design)[-(1:16)] indicates that the diet-specific effects have been split by period:

[1] "DietA:Timepost:Period1" "DietB:Timepost:Period1" "DietA:Timepost:Period2"
[4] "DietB:Timepost:Period2"

This allows you to study the differences between the cross-overs, if that's of any interest.

If you want to study the diet main effect, you'd probably have to use a different design matrix without so many interaction terms. In the above model, any comparison between diets is confounded because we're blocking on a pig/diet combination - each diet is assumed to affect each pig differently, so it wouldn't make much sense to ask for a main effect here. The same applies for the main effect of time. Obviously, even if you use a simpler design, it would probably only make sense for genes that don't have a significant interaction effect.

score 0 · Answer 2 · 2015-10-07

0

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 13 months ago

Icahn School of Medicine at Mount Sinai…

I think one way to do this might be to think of each pig as having 4 time points: pre1, post1, pre2, and post2. Then there are 8 conditions: 2 sequences times 4 "time points". Then you could make a model for these 8 groups using the group means parametrization:

group <- interaction(Seq, Time, Period, sep=".")
design <- model.matrix(~0+group)
colnames(design) <- levels(group)

Then you can formulate whatever contrasts you want to. For instance, the contrast for diet A pre vs post averaged over both A first and B first groups might be:

((AB.post.1 - AB.pre.1) + (BA.post.2 - BA.pre.2)) / 2

That's the mean of the first pre-vs-post difference in the AB pigs and the second pre-vs-post difference in the BA pigs. Now, you just have to add the columns to adjust for pig-to-pig variation, which is a bit tricky. I think the best way to do this would be to make separate designs for the blocking factors of each sequence group, then strip out the intercept columns from both, combine them into a block-diagonal matrix (using bdiag from the Matrix package), and then cbind that onto your existing design matrix.

Anyway, that would probably be my first strategy in analyzing this data, but I would be careful to check all my assumptions along the way.

ADD COMMENT • link 10.2 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Just to note: my approach models the pre-vs-post effects for a given diet with separate coefficients for each sequence group, which would allow you to determine whether there are significant differences depending on which diet comes first. If that's not something you care about, I believe Aaron's design produces a single pre-vs-post coefficient for each diet, whcih is the average of the effects from when that diet is first and second. This results in a simpler model and might be preferable if that's all you care about.

ADD REPLY • link 10.2 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

The design in my answer can also be extended to have a period- and diet-specific pre-vs-post log-fold change, as I've mentioned below. It's probably best used for checking that the order of the diets has no effect, i.e., that the diet-specific pre-vs-post log-fold change is the same between the different periods. In the long run, I'd imagine that most people would want to report the final diet-specific effects by averaging across the two periods and using all samples in the design.

ADD REPLY • link 10.2 years ago Aaron Lun ★ 29k

0

Entering edit mode

Thanks, Aaron and Ryan for your help! I guessed that the answer wasn't straightforward when some of my Googling pulled up a heavy publication in a Statistics journal (1999) and a German dissertation (2004), both of which were over my head (don't have the links anymore, but I could try to find them if you want them). I've tried both the approaches, but it turns out that there just aren't any diet effects or diet:time interactions no matter how you slice it. This was dropped on me to find some sort of magical statistical solution after the researchers spent over a year trying to analyze it on their own. When they gave it to me, they described it as a simple 2x2 factorial design! It wasn't until after an initial analysis when I started asking them about an apparent batch effect in the data (turned out to be period) that they even mentioned the repeated measures per pig, and then more back and forth until they said it was a cross-over design!

Well, at least I learned a lot in all of this - I've been analyzing microarray & RNA-Seq data for 10 years, and probably only a dozen times have I gotten anything more complex than a 2way ANOVA.

Cheers,

Jenny

ADD REPLY • link 10.2 years ago Jenny Drnevich ★ 2.0k