3.9 years ago by
Cambridge, United Kingdom
You can put in
age as a continuous covariate (I'll use an intercept-only model here, as it's easier to explain):
design <- model.matrix(~0 + gender + gender:age)
colnames(design) gives us:
 "genderF" "genderM" "genderF:age" "genderM:age"
To understand what's going on here, imagine fitting a line to the expression values of all female samples against the age. The first and third coefficients represent the intercept and gradient, respectively, of this fitted line. The same reasoning applies to the second and fourth coefficients for the male samples. If you want to test for an age effect in either sex, drop the coefficient corresponding to the gradient term for that sex. You can also do more complex comparisons, e.g., compare the two gradient terms to each other to identify genes that exhibit sex-specific age effects.
Now, the linear approach I've described above is somewhat inflexible, as it doesn't allow for non-linear trends in expression with respect to age. If you have enough samples, you can improve upon the model by using splines:
X <- ns(age, df=5) # Any df between 3 - 5 usually works well.
design <- model.matrix(~0 + gender + gender:X)
This will fit a spline with 5 d.f. to the expression values against the age for each sex, which allows the model to handle non-linear trends. You'll end up with 5 spline coefficients for each sex, in comparison to the single gradient term you had before for the linear approach. To test for an age effect, you should drop all of the spline coefficients for each sex. However, be warned that the log-fold changes you get from this approach won't have a great amount of meaning, as the values of the spline coefficients don't have an obvious interpretation.
modified 3.9 years ago
3.9 years ago by
Aaron Lun • 24k