Rescaling predictors in DESeq changes outcomes?
1
1
Entering edit mode
jc.szamosi ▴ 10
@jcszamosi-9539
Last seen 14 months ago

When I'm running a GLM I will sometimes need to rescale predictors (e.g. take z-scores) to make the coefficients of different predictors easier to compare to each other. My understanding is that this doesn't and shouldn't change anything about the model fit except the scale of the coefficient. When I do this in DESeq2, I find that it changes not only the log2FC estimates, but also the p-values (before adjustment). I don't understand how or why that would be happening, or what to do about it.

Does anyone understand this? Is it expected behaviour? What should I do about it?

Thank you!

deseq2 glm rescale • 166 views
0
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

Having predictors with very different scales can make the fitting numerically unstable, which is known in linear modeling, and it's typically recommended to not use a design matrix with very different scale of predictors.

Because I noticed some users putting in very small and very large values in the design, DESeq2 gives a message to scale the predictors first since v1.26 (so current and previous release give this message).

0
Entering edit mode

jc.szamosi, in addition to Mike's answer, what I usually do is transform the data via vst(..., blind = FALSE), and then use the variance-stabilised expression levels for downstream modeling, clustering, regression, etc. A further scaling to Z-scores may be required; however, functions like those in glmnet will automatically 'standardize' / scale the data while fitting the model. Have to always be careful about the default parameters of all functions.

Not sure if this is directly related to your question.

0
Entering edit mode

Thanks for getting back!

I might be misunderstanding, but I'm not sure this answer applies in my case. The model in question only has one predictor, and the scales are not extreme. It is a continuous, positive predictor with double-digit values (approximate range 50 - 90) and when I rescale it I get values between -2 and 2. Not something I would normally expect to cause numerical instability. Neither the scaled, nor the unscaled version of the model is raising a message about scaling (using version 1.26).

0
Entering edit mode

You have an intercept. It approaches collinearity with the intercept when on the scale 50-90.

The unscaled data should raise a message when you make the dataset (since v1.26)

> cts <- matrix(1:16,ncol=4)
> coldata <- data.frame(x=rnorm(4,100))
> dds <- DESeqDataSetFromMatrix(cts, coldata, ~x)
the design formula contains one or more numeric variables that have mean or standard deviation larger than 5 (an arbitrary threshold to trigger this message). it is generally a good idea to center and scale numeric variables in the design to improve GLM convergence.

0
Entering edit mode

The predictor approaches collinearity with the intercept, you mean? Huh. That surprises me and I'm not sure I understand it, but I'll go with the scaled data. Thank you!

0
Entering edit mode

Take a look at the eigenvalues of X' X as I change the location of the numeric predictor:

> x <- cbind(rep(1,10),-4:5); eigen(t(x)%*%x)$values [1] 85.331865 9.668135 > x <- cbind(rep(1,10),-4:5+10); eigen(t(x)%*%x)$values
[1] 1194.3092241    0.6907759
> x <- cbind(rep(1,10),-4:5+100); eigen(t(x)%*%x)\$values
[1] 1.010950e+05 8.160642e-03