Variation in the estimation of the common dispersion
1
0
Entering edit mode
hz_67 • 0
@hz_67-22039
Last seen 4.6 years ago

I use EdgeR since a while and I have noticed recently that the common dispersion estimated using estimateGLMCommonDisp() was slightly varying according to the row order of the count input (gene ID order). If I do the same analysis with exactly the same input table but by changing the row order, e.g. genes ordered by either decreasing or increasing mean of count across samples, I obtain slightly different dispersion values and p-values. As an example, I found 0.0260375, 0.02603552, 0.02603565 as common dispersion value for the same input table but with three different row order. This difference is very small but I was wondering what could be the reason of this variation?

edger common dispersion • 251 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

By default, estimateGLMCommonDisp() takes a systematic sample of 10,000 rows to compute the common dispersion from. If you change the order of the genes in y, then the selection of genes can in some circumstances change slightly and this will change the estimated dispersion. As your post shows, the variation is so small as to be of no consequence but, if you want, you can avoid any variation by setting subset=Inf.

In recent years, we have been recommending estimateDisp instead of estimateGLMCommonDisp. estimateDisp doesn't subset the genes so the dispersion estimates will be unaffected by changes in the row order.

ADD COMMENT

Login before adding your answer.

Traffic: 431 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6