How to determine which edgeR glmFit and glmQLFit are better
2
1
Entering edit mode
@changjianghxu-9498
Last seen 8.3 years ago

Hi,

I would determine which of the functions glmFit (glmLRT) and glmQLFit (glmQLFTest) should be used for my RNAseq data using function gof, as follows:

f1 <- glmFit(y, design); g1 <- gof(f1, plot = T)

f2 <- glmQLFit(y, design); g2 <- gof(f2, plot = T)

And then compare the QQ-plots of g1 and g2. My questions are

1. The above is right or not? Should the two functions work on the same y, where y is DGEList object with dispersion estimates?

2. Will the dispersion parameters are re-estimated by glmQLFit? Should the dispersion estimates be different for glmFit and glmQLFit?

3. How to know which is better by comparing the QQ-plots of g1 and g2?

Thanks.

Changjiang

 

edger • 9.5k views
ADD COMMENT
6
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 8 hours ago
The city by the bay
  1. gof assumes that the total residual deviance of the fitted GLM follows a chi-squared distribution on the residual degrees of freedom. This is not the case in the quasi-likelihood framework where the deviance is scaled by a gene-specific QL dispersion. Thus, it doesn't seem appropriate to apply gof on f2 - at the very least, you shouldn't compare gof results between f1 and f2, because they aren't really comparable. In general, I would predict that the QL fit will look "worse", but that's because gof doesn't account for presence of the QL dispersion in the model.
  2. glmQLFit uses the trended NB dispersion to fit a GLM, and then estimates the QL dispersion from the deviance. It doesn't re-estimate the NB dispersion but instead estimates an entirely different QL dispersion for each gene. In contrast, glmFit uses the tagwise NB dispersion to fit a GLM - and that's it.
  3. You probably won't be able to tell from the gof results. What I can tell you is that glmQLFit will provide more accurate type I error rate control as it accounts for the uncertainty of the dispersion estimates in a more rigorous manner than the glmFit pipeline. As such, I use it routinely for my analyses.
ADD COMMENT
0
Entering edit mode

Thanks Aaron. I see gof only works for glmFit. How can I do goodness of fitting test for glmQLFit?

Changjiang

ADD REPLY
1
Entering edit mode

Well, I'm not sure that it makes all that much sense to do so. Any irregularity in the initial GLM fit will manifest as a change in the distribution of the deviances, but such changes will end up being modelled by glmQLFit via estimation of f2$var.prior or f2$df.prior. Any (deviance-based) goodness-of-fit testing that includes the QL dispersion would effectively cancel itself out, because the purpose of the QL dispersion is to model any "non-goodness of fit", so to speak.

ADD REPLY
0
Entering edit mode

Aaron is right. You can't do a goodness of fit plot for a QL model fit -- the whole concept of goodness of fit doesn't apply because QL automatically fits dispersions at the gene level.

ADD REPLY
3
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

The goodness of fit plot done by gof() is actually only useful for evaluating models glm models with constant or trended dispersion. The plot is designed to evaluate whether these global dispersion models are adequate --- if not then tagwise dispersions are required. The plot doesn't make sense for a glm tagwise dispersion model and it isn't even defined for a QL model.

So the gof() plot can only be used to decide whether a global dispersion model is adequate. It can't be used to choose between glm-tagwise vs a QL model. By definition, both of these models are adequate from a gof() point of view.

As Aaron has already explained, if you are seriously worried about assumptions, then QL is an obvious choice. It is based on a more conservative model and makes fewer assumptions than does glmLRT().

I have to admit that the gof() help page doesn't explain this properly.

ADD COMMENT
0
Entering edit mode

Thank you very much for the details. glmQLFTest seems too conservative for my data because the BH FDR values all equal to 1. Then I would know whether or not glmLRT may be used instead of glmQLFTest.

Changjiang

ADD REPLY
0
Entering edit mode

glmLRT is a very well regarded method. It can be a little liberal in some circumstances, but only slightly. Why would you not be able to use it?

ADD REPLY
0
Entering edit mode

It is no problem to run both glmLRT and glmQLFTest. Just want to know how to decide which is better or more suitable for my data. Thanks.

Changjiang

ADD REPLY
0
Entering edit mode

Well, I think that we've already explained how to decide. If you want to be conservative and safe, use QL. If you want to be a bit more aggressive and find all the DE genes you can, still with good FDR control, then use glmLRT. You have already said that you prefer the latter, so use glmLRT.

ADD REPLY
0
Entering edit mode

Thanks so much for all the replies. Really helpful!

Changjiang

ADD REPLY

Login before adding your answer.

Traffic: 800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6