Question: Limma linear fit residual standard deviation calculation
0
krassowski.michal0 wrote:

This is a low-level question, but I am very curious about the reason for limma calculating the residual standard deviation (sigma) in lmFit (or more precisely, lm.series) as:

# 1. lm.series
sqrt(mean(out$effects[-(1:out$rank)]^2))

rather than:

# 2. what I would expect
x <- out$residuals sqrt(sum((x - mean(x))^2) / (length(x) - out$rank))
1. Is there a numerical advantage of using the former, or some special case in which it would give different result?

2. Would anyone be able to provide a reference on this way of calculating the residual standard deviation?

In my comparisons, the two are very comparable (mean absolute difference on the order of 1e-17), though I do not have a good intuition for the maths of effects.

Initially, I thought that it might be related to performance, but a simple benchmark suggests that the second option is actually faster:

1. 690 ms ± 17.5 ms per loop (mean ± std. dev. of 7 runs, 100000 loops each)
2. 455 ms ± 28.8 ms per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Also, the latter gives results closer to the R's stats::sigma function:

stats::sigma(lm(y ~ X))

with the mean absolute difference on the order of 1e-19 for my test dataset (1e-17 for the lm.series implementation).

Answer: Limma linear fit residual standard deviation calculation
3
12 weeks ago by
Gordon Smyth38k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth38k wrote:

As you've already confirmed, limma gives the correct answer.

Your formula for sigma would fail if y contains NA or if X does not span the intercept term. However a slight correction:

sum(out$residuals^2, na.rm=TRUE) / out$df.residual

would work and could equally have been used by limma. The formula that limma uses agrees with how one would compute sigma at the C level but, at the R level, it gets slowed down very slightly by having to subset the effects vector.

If you want to understand what the effects are, you might start with these references:

Mathematically, the effects are Q^T y where Q is from the QR decomposition of X / sqrt(weights).

BTW, I do not normally answer questions about limma code, only about how limma behaves at the user level. I've made an exception here because you're a student, but time just doesn't allow me to do that on a regular basis.

ADD COMMENTlink modified 11 weeks ago • written 12 weeks ago by Gordon Smyth38k

Thank you for taking the time to answer this question, I really appreciate it. I find the second reference very useful. Thank you!