I have Illumina HT-12 microarray data from a crossover study. There are samples at study start, after treatment period 1, and after treatment period 2. All subjects had samples at each time point. All subjects had treatment A and treatment B (half of the subjects had Aduring treatment period 1 followed by B during treatment period 2; half had B first followed by A). We are interested in the contrast of Treatment A relative to baseline (study enrollment timepoint, before any treatment) and Treatment B relative to baseline.
I noticed that the p-values reported by topTable()
differ for treatment A depending on whether or not the treatment B samples are present in the ExpressionSet
. For example:
# Function for fitting the model
make_fit <- function(data) {
trt_ <- factor(pData(data)$TreatmentCode)
ptid <- factor(pData(data)$Subject)
design <- model.matrix(~ 0 + trt_ + Subject)
fit <- lmFit(data, design)
contrasts <- makeContrasts(
A = trt_A - trt_Baseline,
levels = design)
fit <- eBayes(contrasts.fit(contrasts, fit = fit))
topTable(fit, coef = "A")
}
complete <- data # all of the data (baseline, treatment A and treatment B)
# just the baseline and treatment A samples
baseline_and_A <- data[ , pData(data)$TreatmentCode %in% c("Baseline", "A")]
# apply the make_fit function defined above to the completed and subsetted data
make_fit(complete)
make_fit(baseline_and_A)
The p-values are greater for treatment A when I fit the model on the ExpressionSet
containing only the baseline and treatment A samples (baseline_and_A
) than the p-values when I fit the model on the ExpressionSet
including baseline, treatment A, and treatment B samples (complete
). The fold changes are identical.
Why does the presence of the treatment B samples change the p-values for treatment A, given that the treatment B samples are not involved in the contrast? I am trying to determine which method is more appropriate to report.
Thank you in advance.
Thank you very much for your helpful explanation!