**0**wrote:

This is a low-level question, but I am very curious about the reason for limma calculating the residual standard deviation (`sigma`

) in `lmFit`

(or more precisely, `lm.series`

) as:

```
# 1. lm.series
sqrt(mean(out$effects[-(1:out$rank)]^2))
```

rather than:

```
# 2. what I would expect
x <- out$residuals
sqrt(sum((x - mean(x))^2) / (length(x) - out$rank))
```

Is there a numerical advantage of using the former, or some special case in which it would give different result?

Would anyone be able to provide a reference on this way of calculating the residual standard deviation?

In my comparisons, the two are very comparable (mean absolute difference on the order of 1e-17), though I do not have a good intuition for the maths of `effects`

.

Initially, I thought that it might be related to performance, but a simple benchmark suggests that the second option is actually faster:

- 690 ms ± 17.5 ms per loop (mean ± std. dev. of 7 runs, 100000 loops each)
- 455 ms ± 28.8 ms per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Also, the latter gives results closer to the R's `stats::sigma`

function:

```
stats::sigma(lm(y ~ X))
```

with the mean absolute difference on the order of 1e-19 for my test dataset (1e-17 for the `lm.series`

implementation).

**38k**• written 29 days ago by krassowski.michal •

**0**