Limma design matrix for three-treatment crossover study
1
0
Entering edit mode
smhughes ▴ 10
@smhughes-11737
Last seen 3.6 years ago

We performed Illumina HT-12 microarrays on samples from a three-treatment crossover study. We want to determine what effects the treatments had on gene expression relative to baseline. I am hoping for some help with setting up the design matrix. There were 36 subjects, each of whom contributed a baseline sample and then one sample at the end of each treatment period for a total of 144 samples. All subjects received treatments A, B, and C. Five received order BC, seven received order BAC, and six each received the treatments in the other four orders. Each treatment period was equal length, with a wash-out period in between.

The samples look like this:

    ##    Subject Visit Treatment Order
## 1        1     1  Baseline   CAB
## 2        1     2         C   CAB
## 3        1     3         A   CAB
## 4        1     4         B   CAB
## 5        2     1  Baseline   CBA
## 6        2     2         C   CBA
## 7        2     3         B   CBA
## 8        2     4         A   CBA
## 9        3     1  Baseline   BCA
## 10       3     2         B   BCA

We are interested in the effect of treatment A relative to baseline, treatment B relative to baseline, and treatment C relative to baseline. My initial thought to set up the design matrix was like this:

    # create design matrix
trt_ <- factor(mapping$Treatment) subj <- factor(mapping$Subject)
design <- model.matrix(~ 0 + trt_ + subj)

##   trt_Baseline trt_A trt_B trt_C subj2 subj3 subj4 subj5 subj6 subj7 subj8
## 1            1     0     0     0     0     0     0     0     0     0     0
## 2            0     0     0     1     0     0     0     0     0     0     0
## 3            0     1     0     0     0     0     0     0     0     0     0
## 4            0     0     1     0     0     0     0     0     0     0     0
## 5            1     0     0     0     1     0     0     0     0     0     0
## 6            0     0     0     1     1     0     0     0     0     0     0
##   subj9 subj10 subj11 subj12 subj13 subj14 subj15 subj16 subj17 subj18
## 1     0      0      0      0      0      0      0      0      0      0
## 2     0      0      0      0      0      0      0      0      0      0
## 3     0      0      0      0      0      0      0      0      0      0
## 4     0      0      0      0      0      0      0      0      0      0
## 5     0      0      0      0      0      0      0      0      0      0
## 6     0      0      0      0      0      0      0      0      0      0
##   subj19 subj20 subj21 subj22 subj23 subj24 subj25 subj26 subj27 subj28
## 1      0      0      0      0      0      0      0      0      0      0
## 2      0      0      0      0      0      0      0      0      0      0
## 3      0      0      0      0      0      0      0      0      0      0
## 4      0      0      0      0      0      0      0      0      0      0
## 5      0      0      0      0      0      0      0      0      0      0
## 6      0      0      0      0      0      0      0      0      0      0
##   subj29 subj30 subj31 subj32 subj33 subj34 subj35 subj36
## 1      0      0      0      0      0      0      0      0
## 2      0      0      0      0      0      0      0      0
## 3      0      0      0      0      0      0      0      0
## 4      0      0      0      0      0      0      0      0
## 5      0      0      0      0      0      0      0      0
## 6      0      0      0      0      0      0      0      0

# fit models
fit <- lmFit(filtered, design)

# create contrasts
contrasts <- makeContrasts(
A = trt_A - trt_Baseline,
B = trt_B - trt_Baseline,
C = trt_C - trt_Baseline,
levels = design)

contrasts_fit <- contrasts.fit(contrasts, fit = fit)

finished_fit <- eBayes(contrasts_fit)

My overall question is whether this is a reasonable way to analyze these data.

More specifically:

• Should I include the order of treatments in the design?
• Should I include the visit in the design?

1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 1 hour ago
The city by the bay

Should you include the order of treatments? Well, presumably whoever designed the experiment thought it was important, otherwise they wouldn't have bothered to use different treatment orders. So, you might as well put it in the model and see if it affects anything. I would do something like this:

ord <- factor(mapping\$Order)
design <- model.matrix(~ 0 + subj + trt_:ord)
design <- design[,!grepl("Baseline", colnames(design))] # get to full rank.


This should yield a design matrix where the interaction terms represent the log-fold change over baseline for each treatment/order combination. You can then test for DE between log-fold changes for the same treatment but different orders, to see if order has any impact on the treatment effect. For example:

con <- makeContrasts(trt_A_ordBCA - trt_A_ordCAB, trt_A_ordBCA - trt_A_ordABC,
levels=design) # Replace all ":" with "_" in colnames(design)

If this comparison yields a negligible number of DE genes, then you could probably revert to the simpler model. As for the visit factor, you don't need that, because that information contained therein is redundant with that in the order.

0
Entering edit mode

Thank you very much for taking the time to respond. Your answer was helpful and clear. I have taken a look at a few of the combinations of treatments and orders as you advised and they haven't turned up anything, so most likely I will revert to the simpler model once I have time to examine all of the combinations and confirm that there are mostly no effects of order on the treatment effects.

Also, I like your picture :)