Age covariate continuous vs. categorical
Entering edit mode
L_K • 0
Last seen 23 months ago

Dear Bioconductor community


I have a question regarding the usage of age as a covariate. As proposed multiple times I tried to categorize the age covariate in order to account for it. However, as I have a rather small sample size (3 groups, n=8,n=5,n=6) it turns out that it is pretty hard to find the right way/step to cut the ages. As I tried initally 4 categories and it ended up being very unbalanced between the experimental conditions, I tried cutting with 3 breaks. You can find the resulting frequencies below:

3 breaks:

4 breaks:

As you can see there is always a pretty severe imbalance between the age categories and the experimental conditions.

So know I really do not know what to do. There are multiple options: Use age as categorical covariate (I still don't know how many breaks would be reasonable), use age as a continuous covariate (this is not suggested), don't account for age (might be ok, since we are investigating a late-onset disease and all individuals are over the critical age), or don't account for age and use SVA (not sure about that one, if I do that I get a significant surrogate variable that correlates with age with a coefficient of -0.45...).

Below you can find the distribution of ages (or birth years respectively between the experimental conditions (y axis)

I would really appreciate your help.

Thanks a lot


deseq2 linear model covariates age design model • 3.3k views
Entering edit mode
Last seen 1 day ago
United States

You can add age as a continuous covariate, but keep in mind that, e.g. ~age + ... implies that gene expression will have multiplicative increases with each unit of age.

By the way, I'd recommend to actually put the age in the model rather than birth year, it's much more interpretable this way, and doesn't lead to weird changes to the intercept because one of the covariates has a range from e.g. 1985-2000

Entering edit mode

Thank you very much for your valuable answer. I'll fix the age/Year of Birth thing.

However, my limited statistical knowledge doesn't allow me to understand your remark regarding the multiplicative effect of a continuous age covariate. Would it be possible to quickly elaborate on this?  Thank you.

Entering edit mode

Check the vignette section on the statistical model of DESeq2 (or it's also in the first section of the Results of the DESeq2 paper).

If you have a column of x that gives the age, and then a coefficient beta that you multiply with the age column (as well as the others columns of x and their respective betas) this gives you the log2 of expression. This implies that you have multiplicative increases in expression with increases in age.

Entering edit mode

Ok, got it. Thanks a lot for your time!


Login before adding your answer.

Traffic: 253 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6