Question

bounded trended dispersion edgeR

1

Entering edit mode

Koen Van den Berge ▴ 180

@koen-van-den-berge-6369

Last seen 6 months ago

Ghent University, Belgium

In edgeR, one usually estimates a common, trended and tag wise dispersion in order to estimate the mean-variance trend over all genes.

I have noisy data with large BCVs. It is actually single-cell RNA-seq, and just for comparison, I would like to analyse it with edgeR. However, the common, trended and tag wise dispersion estimates are bounded. One can fix the bounded common dispersion with the 'interval' argument, but not the other two.

The estimateDisp function does not have these bounded estimates, but does not allow the inclusion of weights, which the other three functions above do.

Thus I am interested in an edgeR workflow which

(a) is not bounded in its dispersion estimates, and

(b) allows the inclusion of weights.

Is there any way I can relatively easy tweak the code to do this?

Thanks

edger • 1.3k views

ADD COMMENT • link updated 8.7 years ago by Aaron Lun ★ 28k • written 8.7 years ago by Koen Van den Berge ▴ 180

score 2 · Accepted Answer · 2016-03-03

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 20 hours ago

The city by the bay

I'm not sure what you mean by "bounded" dispersion estimates. All functions (including estimateDisp) use a grid search on the adjusted profile likelihood to estimate the NB dispersion, so the only bound is on the range of the grid. For estimateDisp, for example, the grid spans dispersion values from ~0.0001 to over 100 by default. The presence of an upper bound would only be a concern if you have NB dispersions over 100, and if that's the case, the variability is so high that I'd struggle to see what information that gene could provide for detecting DE. Even for noisy single-cell RNA-seq data, values of 1-10 seem to be more typical.

As for the weights, estimateDisp can be easily modified to support weights in the calls to adjustedProfileLik and glmFit. However, the sensibility of the weighting scheme would then become your responsibility. If I remember correctly, a larger weight for a count will scale down its variance and increase its influence on the GLM fit. Thus, your weights should probably be computed based on some measure of the variance, otherwise it wouldn't make much quantitative sense to scale the variances by some other unit of measurement.

ADD COMMENT • link 8.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

Dear Aaron, thank you for the answer. What I meant by bounded is that the trended dispersion trend cannot exceed a BCV of 2 in the estimateGLMTrendedDisp function, at which it will asymptote when it actually should be higher.

Would the only modification for a weighted estimateDisp function be to add it in the `adjustedPRofileLik` and `glmFit` function? I will give this a try. Thank you for the recommendations!

ADD REPLY • link 8.7 years ago Koen Van den Berge ▴ 180

0

Entering edit mode

There's no theoretical or computational reason for why the BCV should asymptote at 2. The main difference between the methods concern the trend fitting algorithm, so any differences in behaviour are probably attributable to greater accuracy of the loess-based method in estimateDisp compared to the bin-spline method in estimateGLMTrendedDisp. Besides, unless you have oracular knowledge, how do you know the BCV should "actually" be higher?

P.S. I was thinking of asking Andy to add support for weights in the function directly, rather than you having to manually do it. Less chance of mistakes that way, but still, you need to be careful with what the weights actually represent.

ADD REPLY • link 8.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

Dear Aaron

I have uploaded an example just to make things clear. You can find it at https://www.dropbox.com/s/7kl3e9fsjbwn1wl/bcvExample.pdf?dl=0 .

Please let me know if the function will be updated in order to cope with the weights. Thank you in advance.

ADD REPLY • link 8.7 years ago Koen Van den Berge ▴ 180