Limma linear fit residual standard deviation calculation
1
0
Entering edit mode
@krassowskimichal-19240
Last seen 4.6 years ago

This is a low-level question, but I am very curious about the reason for limma calculating the residual standard deviation (sigma) in lmFit (or more precisely, lm.series) as:

# 1. lm.series
sqrt(mean(out$effects[-(1:out$rank)]^2))

rather than:

# 2. what I would expect
x <- out$residuals
sqrt(sum((x - mean(x))^2) / (length(x) - out$rank))
  1. Is there a numerical advantage of using the former, or some special case in which it would give different result?

  2. Would anyone be able to provide a reference on this way of calculating the residual standard deviation?

In my comparisons, the two are very comparable (mean absolute difference on the order of 1e-17), though I do not have a good intuition for the maths of effects.

Initially, I thought that it might be related to performance, but a simple benchmark suggests that the second option is actually faster:

  1. 690 ms ± 17.5 ms per loop (mean ± std. dev. of 7 runs, 100000 loops each)
  2. 455 ms ± 28.8 ms per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Also, the latter gives results closer to the R's stats::sigma function:

stats::sigma(lm(y ~ X))

with the mean absolute difference on the order of 1e-19 for my test dataset (1e-17 for the lm.series implementation).

limma • 1.1k views
ADD COMMENT
3
Entering edit mode
@gordon-smyth
Last seen 9 hours ago
WEHI, Melbourne, Australia

As you've already confirmed, limma gives the correct answer.

Your formula for sigma would fail if y contains NA or if X does not span the intercept term. However a slight correction:

sum(out$residuals^2, na.rm=TRUE) / out$df.residual

would work and could equally have been used by limma. The formula that limma uses agrees with how one would compute sigma at the C level but, at the R level, it gets slowed down very slightly by having to subset the effects vector.

If you want to understand what the effects are, you might start with these references:

http://www.statsci.org/smyth/pubs/EoB/ban041-.pdf https://onlinelibrary.wiley.com/doi/full/10.1002/0470011815.b2a14022

Mathematically, the effects are Q^T y where Q is from the QR decomposition of X / sqrt(weights).

BTW, I do not normally answer questions about limma code, only about how limma behaves at the user level. I've made an exception here because you're a student, but time just doesn't allow me to do that on a regular basis.

ADD COMMENT
0
Entering edit mode

Thank you for taking the time to answer this question, I really appreciate it. I find the second reference very useful. Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 398 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6