Question

edgeR quasi-likelihood dispersion estimates

0

Entering edit mode

veronique.storme • 0

@veroniquestorme-12161

Last seen 7.0 years ago

Dear,

I am using the (robust) QL pipeline in edgeR with following lines of code

> estimateDisp

> glmQLFit

I understand that with this pipeline, only the NB trended dispersion is used from the estimateDisp function, and that glmQLFit estimates QL gene-wise dispersions using the empirical Bayes approach.

At which point, or which function calculates the var(y_gi) = sigma^2_g(mu_gi + mu^2_gi phi where phi is NB trended dispersion and sigma^2_g the QL genewise dispersion?

Thanks,

Veronique

edger • 2.1k views

ADD COMMENT • link updated 7.3 years ago by Aaron Lun ★ 28k • written 7.3 years ago by veronique.storme • 0

0

Entering edit mode

Do you mind if I ask why you're asking this question? You are assuming that explicit computation of the variance function is necessary, but actually that is never required. And of course you don't need to know that sort of thing to use the software.

ADD REPLY • link 7.3 years ago Gordon Smyth 50k

0

Entering edit mode

Dear Gordon, I not only wish to use the software, I also want to understand the models behind it. I have read all the literature behind it and like to understand what exactly is happening. I read about this variance function and was wondering why it was never calculated, but indeed - if I understand the paper from Lund correctly - for the LRT test and the QL-F-test, only the dispersion needs to be estimated.

ADD REPLY • link 7.3 years ago veronique.storme • 0

score 3 · Answer 1 · 2017-01-12

3

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 18 hours ago

The city by the bay

Neither function explicitly calculates the variance term. The NB dispersion is estimated by maximizing the Cox-Reid adjusted profile likelihood - or more specifically, by maximizing the CR-APL smoothed across abundances to get the trended dispersion. The QL dispersion is estimated from the total residual deviance of the fitted GLM, with empirical Bayes shrinkage to stabilize the estimates. The equation in your post describes how the variance of each count is modelled, but it's not actually used in the code itself.

Technical note: if you really want to know, a working variance "var(y_gi) = mu_gi + mu^2_gi phi" (without the QL dispersion) is computed at each iteration of GLM fitting for each gene. Both estimateDisp and glmQLFit call glmFit, so I guess this calculation could be considered as part of those functions. Note that these values are only used as weights for least-squares fitting, not to estimate the dispersions themselves.

ADD COMMENT • link 7.3 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks a lot! And the LRT test and the QL F-test only depend on the QL gene-wise dispersion (the posterior variance, object var.post in edegR) (as proven by McCullagh 1983) ?

ADD REPLY • link 7.3 years ago veronique.storme • 0

1

Entering edit mode

Both functions still need the NB dispersion for GLM fitting - the tagwise NB dispersion for the LRT (by default, in glmLRT), and the trended NB dispersion in the QL F-test. The LRT doesn't use the QL dispersion at all. Also, I don't think that McCullagh describes hypothesis testing for quasi-likelihood models in that paper.

ADD REPLY • link 7.3 years ago Aaron Lun ★ 28k