This is a low-level question, but I am very curious about the reason for limma calculating the residual standard deviation (
lmFit (or more precisely,
# 1. lm.series sqrt(mean(out$effects[-(1:out$rank)]^2))
# 2. what I would expect x <- out$residuals sqrt(sum((x - mean(x))^2) / (length(x) - out$rank))
Is there a numerical advantage of using the former, or some special case in which it would give different result?
Would anyone be able to provide a reference on this way of calculating the residual standard deviation?
In my comparisons, the two are very comparable (mean absolute difference on the order of 1e-17), though I do not have a good intuition for the maths of
Initially, I thought that it might be related to performance, but a simple benchmark suggests that the second option is actually faster:
- 690 ms ± 17.5 ms per loop (mean ± std. dev. of 7 runs, 100000 loops each)
- 455 ms ± 28.8 ms per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Also, the latter gives results closer to the R's
stats::sigma(lm(y ~ X))
with the mean absolute difference on the order of 1e-19 for my test dataset (1e-17 for the