#### The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: Age covariate continuous vs. categorical
0
13 months ago by
L_K0
L_K0 wrote:

Dear Bioconductor community

I have a question regarding the usage of age as a covariate. As proposed multiple times I tried to categorize the age covariate in order to account for it. However, as I have a rather small sample size (3 groups, n=8,n=5,n=6) it turns out that it is pretty hard to find the right way/step to cut the ages. As I tried initally 4 categories and it ended up being very unbalanced between the experimental conditions, I tried cutting with 3 breaks. You can find the resulting frequencies below:

3 breaks:

4 breaks:

As you can see there is always a pretty severe imbalance between the age categories and the experimental conditions.

So know I really do not know what to do. There are multiple options: Use age as categorical covariate (I still don't know how many breaks would be reasonable), use age as a continuous covariate (this is not suggested), don't account for age (might be ok, since we are investigating a late-onset disease and all individuals are over the critical age), or don't account for age and use SVA (not sure about that one, if I do that I get a significant surrogate variable that correlates with age with a coefficient of -0.45...).

Below you can find the distribution of ages (or birth years respectively between the experimental conditions (y axis)

I would really appreciate your help.

Thanks a lot

-Matt

modified 13 months ago by Michael Love22k • written 13 months ago by L_K0
Answer: Age covariate continuous vs. categorical
2
13 months ago by
Michael Love22k
United States
Michael Love22k wrote:

You can add age as a continuous covariate, but keep in mind that, e.g. ~age + ... implies that gene expression will have multiplicative increases with each unit of age.

By the way, I'd recommend to actually put the age in the model rather than birth year, it's much more interpretable this way, and doesn't lead to weird changes to the intercept because one of the covariates has a range from e.g. 1985-2000

Thank you very much for your valuable answer. I'll fix the age/Year of Birth thing.

However, my limited statistical knowledge doesn't allow me to understand your remark regarding the multiplicative effect of a continuous age covariate. Would it be possible to quickly elaborate on this?  Thank you.

Check the vignette section on the statistical model of DESeq2 (or it's also in the first section of the Results of the DESeq2 paper).

If you have a column of x that gives the age, and then a coefficient beta that you multiply with the age column (as well as the others columns of x and their respective betas) this gives you the log2 of expression. This implies that you have multiplicative increases in expression with increases in age.