I am currently figuring out the best model for my differential gene expression analysis and thought about including squared covariate terms to account for non-linear dependencies.
Specifically I am thinking about the covariates age (of my patients) and RNA integrity, so that the model would look like this:
RIN^2 is an additional column in my data called
I tried to see how many of my genes are actually correlated with
RIN^2 using limma.
If I only include
RIN^2 in my design, I get many (>1k) genes that show a significant correlation with
RIN^2. However, if I include both
RIN^2 I do not detect any genes that are correlated with either
RIN^2. The actual coefficients for the parameters do not really change, but the estimated standard error increases strongly, which is why they are no longer significant I think. I suspect, that this might be due to the strong correlation between
I would very much appreciate some insights and thoughts, on whether or not you think that it makes sense to include squared covariates when performing differential gene expression analysis.