Question

Wrong stdev.unscaled from contrasts.fit when lmFit contains a weights matrix

0

Entering edit mode

Frederik Ziebell ▴ 20

@frederik-ziebell-14676

Last seen 25 days ago

Heidelberg, Germany

When lmFit is used with a weights matrix, contrasts.fit produces a wrong stdev.unscaled, which is far off from the true value. Here's an example:

Y <- matrix(0, nrow=2, ncol=6)
X <- matrix(c(1,1,1,1,0,0,1,1,0,0,1,0,0,0,1,1,0,1), ncol=3)
weights <- matrix(c(rep(1,4),.2,.4,rep(1,6)), byrow=T, nrow=2)
contr <- c(0,-1,1)
fit <- lmFit(Y, X, weights=weights)
fit <- contrasts.fit(fit, contrasts=contr)

# wrong value
fit$stdev.unscaled[1]
1.384713

# correct value
Xw <- diag(sqrt(weights[1,])) %*% X
sqrt(contr %*% chol2inv(qr(Xw)$qr) %*% contr)
0.9393364

I know the note in contrasts.fit mentions that in case of precision weights or missing values, stdev.unscaled is approximate because the design matrix is not re-factorized for every gene, but for the end user it is difficult to judge if a test statistic is well approximated or not. It would be good to display a warning if contrasts.fit is run in a weights-matrix setting. In my case, the approximation caused a drop in power from 70% to 30%.

limma • 789 views

ADD COMMENT • link updated 2.7 years ago by Gordon Smyth 50k • written 2.7 years ago by Frederik Ziebell ▴ 20

score 0 · Answer 1 · 2021-08-02

The design matrix and contrast you have created are extremely artificial and I cannot see how they could be used for any real analysis. If you defined the design matrix in the usual way, by defining a group variable and then taking contrasts between group means, then the std.unscaled values from contrasts.fit() would be exact, regardless of whether there are precision weights or not.

Even in this worst case scenario, you have shown that the results are somewhat conservative for one gene. That's not the worst problem in the world. Despite the inexact standard error, limma would still be more powerful than an ordinary t-test in this scenario (assuming you have a real dataset and not just two rows of data).

You have described the results as "far off from the true value" but in fact a variation of 40% in the standard deviation is well with the bounds of error when estimating the variance on 3 residual degrees of freedom.

No, I do not see that it would be helpful to display a warning in this situation when (i) the user has not made a mistake, (ii) limma is still controlling the error rate correctly and (iii) limma is still better than the alternative. It is hard to see what possible gain could be achieved by such a warning. The issue is already noted on the contrasts.fit help page.