When I'm running a GLM I will sometimes need to rescale predictors (e.g. take z-scores) to make the coefficients of different predictors easier to compare to each other. My understanding is that this doesn't and shouldn't change anything about the model fit except the scale of the coefficient. When I do this in DESeq2, I find that it changes not only the log2FC estimates, but also the p-values (before adjustment). I don't understand how or why that would be happening, or what to do about it.
Does anyone understand this? Is it expected behaviour? What should I do about it?
Thank you!
jc.szamosi, in addition to Mike's answer, what I usually do is transform the data via
vst(..., blind = FALSE)
, and then use the variance-stabilised expression levels for downstream modeling, clustering, regression, etc. A further scaling to Z-scores may be required; however, functions like those in glmnet will automatically 'standardize' / scale the data while fitting the model. Have to always be careful about the default parameters of all functions.Not sure if this is directly related to your question.
Thanks for getting back!
I might be misunderstanding, but I'm not sure this answer applies in my case. The model in question only has one predictor, and the scales are not extreme. It is a continuous, positive predictor with double-digit values (approximate range 50 - 90) and when I rescale it I get values between -2 and 2. Not something I would normally expect to cause numerical instability. Neither the scaled, nor the unscaled version of the model is raising a message about scaling (using version 1.26).
You have an intercept. It approaches collinearity with the intercept when on the scale 50-90.
The unscaled data should raise a message when you make the dataset (since v1.26)
The predictor approaches collinearity with the intercept, you mean? Huh. That surprises me and I'm not sure I understand it, but I'll go with the scaled data. Thank you!
Take a look at the eigenvalues of X' X as I change the location of the numeric predictor: