We performed Illumina HT-12 microarrays on samples from a three-treatment crossover study. We want to determine what effects the treatments had on gene expression relative to baseline. I am hoping for some help with setting up the design matrix. There were 36 subjects, each of whom contributed a baseline sample and then one sample at the end of each treatment period for a total of 144 samples. All subjects received treatments A, B, and C. Five received order BC, seven received order BAC, and six each received the treatments in the other four orders. Each treatment period was equal length, with a wash-out period in between.
The samples look like this:
## Subject Visit Treatment Order ## 1 1 1 Baseline CAB ## 2 1 2 C CAB ## 3 1 3 A CAB ## 4 1 4 B CAB ## 5 2 1 Baseline CBA ## 6 2 2 C CBA ## 7 2 3 B CBA ## 8 2 4 A CBA ## 9 3 1 Baseline BCA ## 10 3 2 B BCA
We are interested in the effect of treatment A relative to baseline, treatment B relative to baseline, and treatment C relative to baseline. My initial thought to set up the design matrix was like this:
# create design matrix trt_ <- factor(mapping$Treatment) subj <- factor(mapping$Subject) design <- model.matrix(~ 0 + trt_ + subj) head(design) ## trt_Baseline trt_A trt_B trt_C subj2 subj3 subj4 subj5 subj6 subj7 subj8 ## 1 1 0 0 0 0 0 0 0 0 0 0 ## 2 0 0 0 1 0 0 0 0 0 0 0 ## 3 0 1 0 0 0 0 0 0 0 0 0 ## 4 0 0 1 0 0 0 0 0 0 0 0 ## 5 1 0 0 0 1 0 0 0 0 0 0 ## 6 0 0 0 1 1 0 0 0 0 0 0 ## subj9 subj10 subj11 subj12 subj13 subj14 subj15 subj16 subj17 subj18 ## 1 0 0 0 0 0 0 0 0 0 0 ## 2 0 0 0 0 0 0 0 0 0 0 ## 3 0 0 0 0 0 0 0 0 0 0 ## 4 0 0 0 0 0 0 0 0 0 0 ## 5 0 0 0 0 0 0 0 0 0 0 ## 6 0 0 0 0 0 0 0 0 0 0 ## subj19 subj20 subj21 subj22 subj23 subj24 subj25 subj26 subj27 subj28 ## 1 0 0 0 0 0 0 0 0 0 0 ## 2 0 0 0 0 0 0 0 0 0 0 ## 3 0 0 0 0 0 0 0 0 0 0 ## 4 0 0 0 0 0 0 0 0 0 0 ## 5 0 0 0 0 0 0 0 0 0 0 ## 6 0 0 0 0 0 0 0 0 0 0 ## subj29 subj30 subj31 subj32 subj33 subj34 subj35 subj36 ## 1 0 0 0 0 0 0 0 0 ## 2 0 0 0 0 0 0 0 0 ## 3 0 0 0 0 0 0 0 0 ## 4 0 0 0 0 0 0 0 0 ## 5 0 0 0 0 0 0 0 0 ## 6 0 0 0 0 0 0 0 0 # fit models fit <- lmFit(filtered, design) # create contrasts contrasts <- makeContrasts( A = trt_A - trt_Baseline, B = trt_B - trt_Baseline, C = trt_C - trt_Baseline, levels = design) contrasts_fit <- contrasts.fit(contrasts, fit = fit) finished_fit <- eBayes(contrasts_fit)
My overall question is whether this is a reasonable way to analyze these data.
More specifically:
- Should I include the order of treatments in the design?
- Should I include the visit in the design?
Thanks in advance for your help!
Thank you very much for taking the time to respond. Your answer was helpful and clear. I have taken a look at a few of the combinations of treatments and orders as you advised and they haven't turned up anything, so most likely I will revert to the simpler model once I have time to examine all of the combinations and confirm that there are mostly no effects of order on the treatment effects.
Also, I like your picture :)