Age covariate continuous vs. categorical
1
0
Entering edit mode
L_K • 0
@l_k-14850
Last seen 19 months ago

Dear Bioconductor community

I have a question regarding the usage of age as a covariate. As proposed multiple times I tried to categorize the age covariate in order to account for it. However, as I have a rather small sample size (3 groups, n=8,n=5,n=6) it turns out that it is pretty hard to find the right way/step to cut the ages. As I tried initally 4 categories and it ended up being very unbalanced between the experimental conditions, I tried cutting with 3 breaks. You can find the resulting frequencies below:

3 breaks:

4 breaks:

As you can see there is always a pretty severe imbalance between the age categories and the experimental conditions.

So know I really do not know what to do. There are multiple options: Use age as categorical covariate (I still don't know how many breaks would be reasonable), use age as a continuous covariate (this is not suggested), don't account for age (might be ok, since we are investigating a late-onset disease and all individuals are over the critical age), or don't account for age and use SVA (not sure about that one, if I do that I get a significant surrogate variable that correlates with age with a coefficient of -0.45...).

Below you can find the distribution of ages (or birth years respectively between the experimental conditions (y axis)

I would really appreciate your help.

Thanks a lot

-Matt

deseq2 linear model covariates age design model • 3.0k views
2
Entering edit mode
@mikelove
Last seen 13 hours ago
United States

You can add age as a continuous covariate, but keep in mind that, e.g. ~age + ... implies that gene expression will have multiplicative increases with each unit of age.

By the way, I'd recommend to actually put the age in the model rather than birth year, it's much more interpretable this way, and doesn't lead to weird changes to the intercept because one of the covariates has a range from e.g. 1985-2000

0
Entering edit mode

Thank you very much for your valuable answer. I'll fix the age/Year of Birth thing.

However, my limited statistical knowledge doesn't allow me to understand your remark regarding the multiplicative effect of a continuous age covariate. Would it be possible to quickly elaborate on this?  Thank you.

0
Entering edit mode

Check the vignette section on the statistical model of DESeq2 (or it's also in the first section of the Results of the DESeq2 paper).

If you have a column of x that gives the age, and then a coefficient beta that you multiply with the age column (as well as the others columns of x and their respective betas) this gives you the log2 of expression. This implies that you have multiplicative increases in expression with increases in age.

0
Entering edit mode

Ok, got it. Thanks a lot for your time!