Question

Rescaling predictors in DESeq changes outcomes?

1

Entering edit mode

jc.szamosi ▴ 10

@jcszamosi-9539

Last seen 3.8 years ago

Canada

When I'm running a GLM I will sometimes need to rescale predictors (e.g. take z-scores) to make the coefficients of different predictors easier to compare to each other. My understanding is that this doesn't and shouldn't change anything about the model fit except the scale of the coefficient. When I do this in DESeq2, I find that it changes not only the log2FC estimates, but also the p-values (before adjustment). I don't understand how or why that would be happening, or what to do about it.

Does anyone understand this? Is it expected behaviour? What should I do about it?

Thank you!

deseq2 glm rescale • 1.5k views

ADD COMMENT • link updated 5.7 years ago by Michael Love 43k • written 5.7 years ago by jc.szamosi ▴ 10

score 0 · Answer 1 · 2020-05-28

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 days ago

United States

Having predictors with very different scales can make the fitting numerically unstable, which is known in linear modeling, and it's typically recommended to not use a design matrix with very different scale of predictors.

Because I noticed some users putting in very small and very large values in the design, DESeq2 gives a message to scale the predictors first since v1.26 (so current and previous release give this message).

ADD COMMENT • link 5.7 years ago Michael Love 43k

0

Entering edit mode

jc.szamosi, in addition to Mike's answer, what I usually do is transform the data via vst(..., blind = FALSE), and then use the variance-stabilised expression levels for downstream modeling, clustering, regression, etc. A further scaling to Z-scores may be required; however, functions like those in glmnet will automatically 'standardize' / scale the data while fitting the model. Have to always be careful about the default parameters of all functions.

Not sure if this is directly related to your question.

ADD REPLY • link 5.7 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Thanks for getting back!

I might be misunderstanding, but I'm not sure this answer applies in my case. The model in question only has one predictor, and the scales are not extreme. It is a continuous, positive predictor with double-digit values (approximate range 50 - 90) and when I rescale it I get values between -2 and 2. Not something I would normally expect to cause numerical instability. Neither the scaled, nor the unscaled version of the model is raising a message about scaling (using version 1.26).

ADD REPLY • link 5.7 years ago jc.szamosi ▴ 10

0

Entering edit mode

You have an intercept. It approaches collinearity with the intercept when on the scale 50-90.

The unscaled data should raise a message when you make the dataset (since v1.26)

> cts <- matrix(1:16,ncol=4)
> coldata <- data.frame(x=rnorm(4,100))
> dds <- DESeqDataSetFromMatrix(cts, coldata, ~x)
the design formula contains one or more numeric variables that have mean or standard deviation larger than 5 (an arbitrary threshold to trigger this message). it is generally a good idea to center and scale numeric variables in the design to improve GLM convergence.

ADD REPLY • link 5.7 years ago Michael Love 43k

0

Entering edit mode

The predictor approaches collinearity with the intercept, you mean? Huh. That surprises me and I'm not sure I understand it, but I'll go with the scaled data. Thank you!

ADD REPLY • link 5.7 years ago jc.szamosi ▴ 10

0

Entering edit mode

Take a look at the eigenvalues of X' X as I change the location of the numeric predictor:

> x <- cbind(rep(1,10),-4:5); eigen(t(x)%*%x)$values
[1] 85.331865  9.668135
> x <- cbind(rep(1,10),-4:5+10); eigen(t(x)%*%x)$values
[1] 1194.3092241    0.6907759
> x <- cbind(rep(1,10),-4:5+100); eigen(t(x)%*%x)$values
[1] 1.010950e+05 8.160642e-03

ADD REPLY • link 5.7 years ago Michael Love 43k