estimateDisp runs forever but not the trio estimateGLM
1
0
Entering edit mode
Daniel ▴ 10
@daniel-6619
Last seen 2.0 years ago
Finland

Hello,

I get this "y = estimateDisp(y, design, robust = TRUE)" running forever even that the trio estimateGLM runs just fine (and I am able to finish the DE analysis with them and using glmFit and glmLRT).

Therefore, is it safe to replace "y = estimateDisp(y, design, robust = TRUE)" with the trio estimateGLM when using glmQLFit and glmQLTest?

Cheers,

Daniel

edger estimateDisp • 764 views
1
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 10 hours ago
The city by the bay

That's strange, estimateDisp should be faster than the trio. Also, most of the GLM fitting code is shared, so it's odd that one would run forever and the others would be faster. Here's a couple of things to check:

• Make sure that there are no libraries with zero library sizes/non-finite offsets.
• Have you filtered out low abundance genes?

If you can, call debug(estimateDisp) and step through the function until you get to the part that stalls; this would be helpful for us to figure out what's going on.

As for your other question, we would prefer that you use estimateDisp rather than the trio, as the former is more up-to-date. See C: edger, trended or common dispersion for more details.

0
Entering edit mode

> That's strange, estimateDisp should be faster than the trio.

In my case is the other way around. estimateDisp runs ~30 minues while the trio estimateGLM run in less than 2 minutes.

> it's odd that one would run forever and the others would be faster

I agree.

I use edgeR_3.12.0

> Make sure that there are no libraries with zero library sizes/non-finite offsets.

I checked and this is not the case.

> Have you filtered out low abundance genes?

Of course.

> If you can, call debug(estimateDisp) and step through the function until

> you get to the part that stalls; this would be helpful for us to figure out what's going on.

I need to look into this.

As, I have stated in my previous post if I use the "old" approach (i.e. trio of estimateGLM, glmFit and glmLRT) everything goes fine, quickly, and smoothly. On same data (and same contrasts and same designs and same filtering) if I switch to QLF approach (i.e. estimateDisp, glmQLFit and glmQLTest) it goes fine except that "y = estimateDisp(y, design, robust = TRUE)" takes ~30 minutes. Indeed I have a very complex design (e.g. several time points, several batches, several treatments, several controls, etc.) and therefore the filtering step (for low counts) cannot be very effective as in cases with one treatment versus one control (which might lead to having PARTIALLY low-count genes for some groups of samples or time points).