Question: paired two groups comparison using limma
1
fansili201310 wrote:

Hi,

I am comparing two paired groups. Should I follow 9.4.1 paired samples in the user guide or 9.7 in the user guide? [I know 9.7 is for mixed of independent and dependent factors. But two paired group is a special case of 0 independent and 1 dependent factor, so 9.7 should be right also for two paired groups.] I actually think 9.7 is the mixed effect model, thus 9.4.1 (paired t test) should be a special case of 9.7. But from the following test, I am apparently wrong somewhere.

My small simulation study is following,

y = matrix(rnorm(2*10), nrow=2)

treat = factor(rep(c("T1","T2"),5))
subject_id = rep(c("A","B","C","D","E"), each = 2)
design <- model.matrix(~0+treat)
colnames(design) <- levels(treat)
# 1
corfit <- duplicateCorrelation(y,design,block=subject_id)
fit <- lmFit(y,design, block = subject_id, correlation = corfit\$consensus)
cm <- makeContrasts(
Diff = T2-T1,
levels=design)
fit2 <- contrasts.fit(fit, cm)
fit2 <- eBayes(fit2)
topTable(fit2, coef="Diff")
logFC    AveExpr         t    P.Value adj.P.Val         B
1 -0.7933632 -0.1601572 -1.837463 0.08478521 0.1695704 -4.118469
2 -0.3660406  0.1586736 -1.025867 0.32021504 0.3202150 -4.955690
# 2
design <- model.matrix(~subject_id+treat)
fit <- lmFit(y, design)
fit <- eBayes(fit)
topTable(fit, coef="treatT2")
logFC    AveExpr         t   P.Value adj.P.Val         B
1 -0.7933632 -0.1601572 -1.668410 0.1345112 0.2690224 -4.295977
2 -0.3660406  0.1586736 -1.166903 0.2774725 0.2774725 -4.686874
# 3
t.test(y[1,]~treat, paired = TRUE)

t = 1.4203, df = 4, p-value = 0.2286

They return different p value. Why? And which one is more appropriate?

Answer: paired two groups comparison using limma
1
Aaron Lun25k wrote:

For convenience, I will denote the duplicateCorrelation approach as model X, and the design matrix blocking as model Y.

In terms of the model you are fitting, model Y is not a special case of X. As you may already appreciate, these two models use different approaches to modelling the batch effect. X treats the batch as a random effect, while Y treats it as a fixed effect. This involves different statistical theory - X uses generalized least squares, while Y uses standard least squares - so it's no surprise that you get different p-values.

In a transcriptome-wide DE context, model X assumes that the correlation in the residuals induced by the batch effect is the same across all genes. This is necessary to obtain a stable estimate of the correlation but can result in some anticonservativeness when the assumption is violated. In contrast, model Y estimates separate batch coefficients for each gene, which avoids the above assumption. However, this uses up information in estimating the batch terms, which could otherwise contribute to the estimation of your factors of interest.

This has some consequences for the downstream analysis. Specifically, the p-values obtained from model X will usually be a bit smaller, which is due to a combination of increased power and some anticonservativeness, as discussed above. I've always erred on the side of conservativeness and used model Y if my batches are orthogonal to the conditions of interest, i.e., classical blocking in the design matrix. However, model X is the only option if the batches are nested within conditions to be compared.

Thank you Aaron. If I understand this correctly, the model X is not the mixed effect model?

Well, X is analogous to a mixed effect model. There are some technical differences relating to how the model is ultimately fitted (e.g., GLS vs REML), and also in how the variance of the random batch effect is estimated.

However, these differences are irrelevant to your original question. Even if X was a fully fledged mixed effect model, there is no reason that it would yield the same p-value as model Y. The fact remains that the batch effect is being modelled in an entirely different manner between X and Y, so the p-values should still be different. As far as I can remember, the theory for hypothesis testing in mixed effect models is a lot messier than that for standard linear models, necessitating different tests entirely.

Don't believe me? Let's whip out an example with lme4:

set.seed(10)
y <- rnorm(10)
treat <- factor(rep(c("T1","T2"),5))
subject_id <- rep(c("A","B","C","D","E"), each = 2)

# Linear mixed model:
library(lme4)
full.m <- lmer(y ~ treat + (1|subject_id))
null.m <- lmer(y ~ (1|subject_id))
anova(full.m, null.m) # p-value of 0.05818

# Linear model (fixed effects only):
full.f <- lm(y ~ treat + subject_id)
null.f <- lm(y ~ subject_id)
anova(full.f, null.f) # p-value of 0.1098 

So while the concept of a fixed effects model is a specialization of the concept a mixed effects model, the specific fixed effects model used in lm above is not a specialized case of the model used in lmer.