Question: Calculating contrasts for continuous variable anad interaction effect in Limma
2
3.7 years ago by
jaison7520
United States
jaison7520 wrote:

Hi,

I am a little confused about constructing the the appropriate contrast matrix in Limma when I have continuous variables  as one of the predictors and an interaction term in my model.  I am posting a minimal example below.

library(limma)
age = sample(c(20:75),80, replace=T)
gender <- sample(c("M", "F"), 80, replace =T)
gender <- as.factor(gender)
ageGrp <- ifelse(age>35, "old", "young")

# using age as factors I create the design matrix and calculate
# contrast for Old vs Young in Females and Males respectively
design <- model.matrix(~gender*ageGrp)
unique(design)
colnames(design) <- c("intercept", "gMale", "aYoung", "MxY")
cont.matrix <- makeContrasts(
F_OvY = -aYoung,
M_OvY = -aYoung - MxY, levels = design
)

How would I do the same analysis if age were continuous, ie if I use age instead of the ageGrp in the code above?

Thanks

-Jaison

modified 3.7 years ago by Aaron Lun24k • written 3.7 years ago by jaison7520
Answer: Calculating contrasts for continuous variable anad interaction effect in Limma
3
3.7 years ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

You can put in age as a continuous covariate (I'll use an intercept-only model here, as it's easier to explain):

design <- model.matrix(~0 + gender + gender:age)

Running colnames(design) gives us:

[1] "genderF"     "genderM"     "genderF:age" "genderM:age"

To understand what's going on here, imagine fitting a line to the expression values of all female samples against the age. The first and third coefficients represent the intercept and gradient, respectively, of this fitted line. The same reasoning applies to the second and fourth coefficients for the male samples. If you want to test for an age effect in either sex, drop the coefficient corresponding to the gradient term for that sex. You can also do more complex comparisons, e.g., compare the two gradient terms to each other to identify genes that exhibit sex-specific age effects.

Now, the linear approach I've described above is somewhat inflexible, as it doesn't allow for non-linear trends in expression with respect to age. If you have enough samples, you can improve upon the model by using splines:

require(splines)
X <- ns(age, df=5) # Any df between 3 - 5 usually works well.
design <- model.matrix(~0 + gender + gender:X)


This will fit a spline with 5 d.f. to the expression values against the age for each sex, which allows the model to handle non-linear trends. You'll end up with 5 spline coefficients for each sex, in comparison to the single gradient term you had before for the linear approach. To test for an age effect, you should drop all of the spline coefficients for each sex. However, be warned that the log-fold changes you get from this approach won't have a great amount of meaning, as the values of the spline coefficients don't have an obvious interpretation.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Aaron Lun24k

Hi Aaron,

I am also dealing with continuous covariate in my design matrix as well. I am using edgeR, I have created a following design matrix:

design1 <- model.matrix(~groups+groups:age)

where there are four different groups.

So, colname(design1) would give:

(Intercept) group2 group3 group4 group1:age group2:age group3:age group4:age.

If I were to do a comparison such as: lrt <- glmLRT(fit, coef=2), what am I comparing? Am I merely comparing group1 and group2? It sounds like (from what you wrote above), this comparison is affected by the age covariate included in the design matrix. If I were to create a design matrix: design2 <- model.matrix(~group), and perform the same comparison between group1 and group2, would that give a same result?

Also, using design1 again, if I were to compare group2:age with intercept (coef=6), would that be testing for age effect on group2? And comparing group1:age and group2:age would be testing for the differences in age effect between group1 and group2?

My apologies for asking a lot of questions. I am quite confused by the usage of design matrix, especially with continuous variables interaction. Thank you.