I wonder if the design/contrast I'm using in Limma is the best one for my experiment. In particular, I'm not sure of how
duplicateCorrelation works here. I though I'd post the design here and hopefully someone can give me some hints.
The research question is quite straight forward, but there are some complications with the design. The experiment studies treated/untreated (male and female) with a drug and then before/after an intervention. The questions we'd like to answer are:
How does the transcriptional profile differ between treated/untreated?
Are there genes that respond to the intervention differently in treated/untreated? This is the most relevant question.
Are there sex differences in the response to the intervention?
Then there are two complications:
The samples were taken in 5 batches, and those are confounded with sex (3 for male and two for female). The female samples were also generated by another hospital and much later, so the batch effect between (1/2/3) and (4/5) is larger than between 1/2/3 or 4/5.
For male, both samples for a subject (i.e. the before/after intervention samples) are analysed in the same batch. For female, the samples have been randomized to batch, so there are some subjects which have their "before" sample in one batch and their "after" sample in another.
We need to account for a number of covariates; age, BMI and so on. Importantly, we'd like to account for a type of medication that's only being used by a subset of the treated subjects (not the same treatment that the study is about). All these factors are properties of subjects rather than samples.
My current reasoning is that we cannot say anything about sex differences, since it's confounded by batch. Blocking by subject would be fine if we were only interested in the effect of the intervention, but then we cannot say anything about the interaction effect with treated/untreated (since they would be colinear). It would also fail to handle the batch effect for those subjects where the before/after samples were in different batches. Also, we'd like to quantify the effect of the covariates as a sanity check. Based on this, the only solution I could see was to view subject as a random effect. The model I'm using is:
design <- model.matrix(~ 0 + factor(intervention) + factor(treated_untreated) + factor(batch) + bmi + factor(other_medication)) corfit <- duplicateCorrelation(exprs(es), design, block = pData(es)$Subject) fit <- lmFit(exprs(es), design, block = pData(es)$Subject, correlation = corfit$consensus)
other_medication is a confounding medication that I'd like to remove the effect of. It's FALSE for all untreated and some of the treated. I then test for differentially expressed genes with contrast, e.g.
contrast <- makeContrasts(interventionAfter - interventionBefore, levels=design) fit2 <- contrasts.fit(fit, contrast) fit2 <- eBayes(fit2, trend = TRUE)
Does this make sense? Will
other_medication be corrected for as I'd like even though there are no untreated subjects who are on that drug or will it get some of the "treated signal"? Are there any complications with using
duplicateCorrelaction like this when some paired samples are split between batches? Thanks a lot!