Question: [limma] Define contrasts for model with covariates
0
16 months ago by
enricoferrero570
Switzerland
enricoferrero570 wrote:

I'm having a hard time understanding how to pull out the relevant contrasts when using a model with covariates in limma.

Let's say I have the following gene expression matrix and experimental variables:

library(limma)
set.seed(16)

mat <- matrix(rnorm(600), 100, 6)

age <- c(45, 55, 47, 56, 43, 56)
gender <- factor(c("M", "F", "F", "F", "M", "F"))
condition <- factor(c(rep("healthy", 3), rep("disease", 3)), levels = c("healthy", "disease"))

If I were to use a very simple design (~ condition) then I could easily identify genes differentially expressed in disease vs healthy both using a treatment-contrast parametrization (case A below) or using a group-means parametrization and manually defining the contrast (case B below):

# case A: simple design; treatment-contrast parametrization
design_A <- model.matrix(~ condition)
fit_A <- lmFit(mat, design_A)
fit_A <- eBayes(fit_A)
tt_A <- topTable(fit_A, coef = "conditiondisease")

# case B: simple design; group-means parametrization
design_B <- model.matrix(~ 0 + condition)
fit_B <- lmFit(mat, design_B)
cont.matrix_B <- makeContrasts(diseaseVShealthy = conditiondisease - conditionhealthy, levels = design_B)
fit_B <- contrasts.fit(fit_B, cont.matrix_B)
fit_B <- eBayes(fit_B)
tt_B <- topTable(fit_B, coef = "diseaseVShealthy")

In fact, in this case the tt_A and tt_B tables contain the same results.

However, if I want to use a more complex design (~ age + gender + condition), I can still do the treatment-contrast parametrization (case C below) but I can no longer do the group-means parametrization because I'm missing the conditionhealthy column in the design matrix (case D below):

# case C: complex design; treatment-contrast parametrization
design_C <- model.matrix(~ age + gender + condition)
fit_C <- lmFit(mat, design_C)
fit_C <- eBayes(fit_C)
tt_C <- topTable(fit_C, coef = "conditiondisease")

# case D: simple design; group-means parametrization
design_D <- model.matrix(~ 0 + age + gender + condition)
fit_D <- lmFit(mat, design_D)
cont.matrix_D <- makeContrasts(diseaseVShealthy = conditiondisease - conditionhealthy, levels = design_D)

Error in eval(ej, envir = levelsenv) :

#fit_D <- contrasts.fit(fit_D, cont.matrix_D)
#fit_D <- eBayes(fit_D)
#tt_D <- topTable(fit_D, coef = "diseaseVShealthy")

I would like to use the group-means parametrization with the more complex design because I think it's much clearer what's going on.

How can I do that?

Thank you!

P.S.: Note that I also don't completely understand the expressions "treatment-contrasts" and "group-means" parametrization, but this is what they are called on the limma user guide (section 9.2).

modified 16 months ago by Gordon Smyth39k • written 16 months ago by enricoferrero570
Answer: [limma] Define contrasts for model with covariates
2
16 months ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

Just be sure to put condition first in the linear model:

sign_D <- model.matrix(~ 0 + condition + age + gender)

Then you can use contrasts just as you did in Case B.

Thanks Gordon!

Can you please explain why that's the case? I.e.: why only the first categorical variable in the formula gets both levels as coefficients?

2

Because every following term is interpreted as an additive effect. In Gordon's model above:

• There is no intercept because of the ~0.
• condition has healthy and disease levels. There's no intercept, so the function creates a term for healthy, and another term for disease, representing the average log-expression in each group.
• age is real valued, so this just gets added as an extra column.
• gender (which is a social construct, I suppose you really mean sex) has male and female. With the default level ordering, i.e., alphanumeric, Female is treated as the reference. This means that the sum of the existing terms, with no other additions, describe the expected log-expression in female samples. For males, the function adds another term for the male vs. female log-fold change, which is presumed to be additive.

Hence, you get terms for all levels in condition, but by the time you get to gender, you don't get a term for the first reference level. Sometimes you can trick model.matrix into giving you a term, but this usually means that your matrix is not of full rank and that you need to remove the term in order to actually fit the linear model.