Rescaling predictors in DESeq changes outcomes?
1
1
Entering edit mode
jc.szamosi ▴ 10
@jcszamosi-9539
Last seen 2.6 years ago
Canada

When I'm running a GLM I will sometimes need to rescale predictors (e.g. take z-scores) to make the coefficients of different predictors easier to compare to each other. My understanding is that this doesn't and shouldn't change anything about the model fit except the scale of the coefficient. When I do this in DESeq2, I find that it changes not only the log2FC estimates, but also the p-values (before adjustment). I don't understand how or why that would be happening, or what to do about it.

Does anyone understand this? Is it expected behaviour? What should I do about it?

Thank you!

deseq2 glm rescale • 1.1k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 10 hours ago
United States

Having predictors with very different scales can make the fitting numerically unstable, which is known in linear modeling, and it's typically recommended to not use a design matrix with very different scale of predictors.

Because I noticed some users putting in very small and very large values in the design, DESeq2 gives a message to scale the predictors first since v1.26 (so current and previous release give this message).

ADD COMMENT
0
Entering edit mode

jc.szamosi, in addition to Mike's answer, what I usually do is transform the data via vst(..., blind = FALSE), and then use the variance-stabilised expression levels for downstream modeling, clustering, regression, etc. A further scaling to Z-scores may be required; however, functions like those in glmnet will automatically 'standardize' / scale the data while fitting the model. Have to always be careful about the default parameters of all functions.

Not sure if this is directly related to your question.

ADD REPLY
0
Entering edit mode

Thanks for getting back!

I might be misunderstanding, but I'm not sure this answer applies in my case. The model in question only has one predictor, and the scales are not extreme. It is a continuous, positive predictor with double-digit values (approximate range 50 - 90) and when I rescale it I get values between -2 and 2. Not something I would normally expect to cause numerical instability. Neither the scaled, nor the unscaled version of the model is raising a message about scaling (using version 1.26).

ADD REPLY
0
Entering edit mode

You have an intercept. It approaches collinearity with the intercept when on the scale 50-90.

The unscaled data should raise a message when you make the dataset (since v1.26)

> cts <- matrix(1:16,ncol=4)
> coldata <- data.frame(x=rnorm(4,100))
> dds <- DESeqDataSetFromMatrix(cts, coldata, ~x)
the design formula contains one or more numeric variables that have mean or standard deviation larger than 5 (an arbitrary threshold to trigger this message). it is generally a good idea to center and scale numeric variables in the design to improve GLM convergence.
ADD REPLY
0
Entering edit mode

The predictor approaches collinearity with the intercept, you mean? Huh. That surprises me and I'm not sure I understand it, but I'll go with the scaled data. Thank you!

ADD REPLY
0
Entering edit mode

Take a look at the eigenvalues of X' X as I change the location of the numeric predictor:

> x <- cbind(rep(1,10),-4:5); eigen(t(x)%*%x)$values
[1] 85.331865  9.668135
> x <- cbind(rep(1,10),-4:5+10); eigen(t(x)%*%x)$values
[1] 1194.3092241    0.6907759
> x <- cbind(rep(1,10),-4:5+100); eigen(t(x)%*%x)$values
[1] 1.010950e+05 8.160642e-03
ADD REPLY

Login before adding your answer.

Traffic: 522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6