Search
Question: edger, trended or common dispersion
0
2.6 years ago by
tonja.r40
United Kingdom
tonja.r40 wrote:

I am  bit in doubts if I should use estimateGLMCommonDisp(y,design) or estimateGLMTrendedDisp(y,design). There is however a difference in DE genes, not a big one but still. How should the ideal plots look like?
My BCV plots are following:

Trended dispersion estimated:

and only common dispersion:

modified 2.0 years ago by David ROUX0 • written 2.6 years ago by tonja.r40
3
2.6 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.8k wrote:

It seems pretty clear that there is a significant trend in your dispersions, so I would recommend using trended dispersion. However, you generally should just use estimateDisp instead of estimateGLMCommonDisp or estimateGLMTrendedDisp.

Why should I use estimateDisp? In manual it is stated that I can alternatively use another functions:

y <- estimateDisp(y, design)


Alternatively, one can use the following calling sequence to estimate them one by one. To estimate common dispersion:

y <- estimateGLMCommonDisp(y, design)


To estimate trended dispersions:

y <- estimateGLMTrendedDisp(y, design)


To estimate tagwise dispersions:

y <- estimateGLMTagwiseDisp(y, design)

3

Because estimateDisp provides protection against zero fitted values, gives a more stable likelihood landscape (due to hot-starting at every step of the grid search), gives a more graduated trend from local fitting compared to binning, and can estimate the necessary prior degrees of freedom for EB shrinkage from the data. Oh, and it's under active development. I think that we're intending for estimateDisp to subsume the functionality of the other estimation functions... eventually.

If I use estimateDisp on another dataset instead of those three commands, I get following BCV plot:

and if I use those three commands instead, I get:

So, the first pic produced by estimateDisp seemed a bit strange to me, so I decided to avoid this function and use the three command instead. What do you think?

1

There's no problem. As Aaron explained, the newer pipeline uses the data to estimate the prior degrees of freedom for the tagwise dispersions. The older pipeline simply uses 10 prior df unconditionally unless you tell it to use some other value. If you look at your DGEList object after running estimateDisp, it probably has a very large or even infinite prior df, indicating that there is insufficient evidence of gene-specific dispersions in your dataset, so that each gene simply uses the trend instead of getting its own dispersion. This is likely to happen if you have very few samples and/or low sequencing depth. In either case, the estimateDisp result is more honest about what information you do and do not have about each gene's dispersion.

But what does it mean regarding the DE genes then if my BCV looks like the first picture?

I'm not sure what you're asking. In general, higher prior d.f. means that you are able to share more information across genes. This improves the precision of the dispersion estimates and increases power to detect DE genes. There's no cause for alarm based on this plot; the mean-dispersion trend has a nice decreasing shape and the values on the y-axis are fairly low. The fact that every dispersion is shrunk to the trend just means that the genes in your data have dispersions that don't vary much from the trend. The only extra thing I can think of is to set robust=TRUE, to protect against overzealous shrinkage of outlier dispersions.

Just a small question regarding those functions. The RUVseq manual uses following:
y <- estimateGLMCommonDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
Also, DiffBind does not use estimateDisp function in the edgeR analysis.
What is the motivation that they use the functions separately then?

The code for those packages was likely written before the estimateDisp function was introduced into edgeR.

Would it be advisable to use glmQLFit over glmFit as I have only two replicates per condition?

That's fine. The whole point of information sharing via EB shrinkage (of the QL or NB dispersions) is to overcome problems due to low numbers of replicates. Obviously, though, the more replicates the better.

As far as I understood the main difference between  QL and NB dispersions is that QL accounts for gene-specific variability, meaning the more replicates the better. If I have only two replicates, there is no point to use QL, right? Or when one should use QL and when NB?

2

EB shrinkage of both QL and NB dispersions can account for gene-specific variability. However, the QL approach accounts for the uncertainty of the estimates in a more statistically rigorous manner. This ensures that type I error control is maintained during testing. In contrast, shrinkage of the NB dispersions is a bit less rigorous; the NB distribution doesn't have a conjugate prior for the dispersion parameter, and the uncertainty of dispersion estimation is difficult to model.

Besides, you can still have gene-specific variability with two replicates. If one gene is variable between the two replicates, and another is not variable, then - behold - you have gene-specific variability.

0
2.0 years ago by
France (Avignon University)
David ROUX0 wrote:

In my case, if I use "estimateDisp" instead of the other functions I get  :

"Error in locfitByCol(loglik, covariate, span = span, degree = 0) :
locfit required but is not available"

And, I still do not realy know how to interpret the plot.

I really like the funny sentence of Aaron : "a nice decreasing shape " :-)

But what to expect objectively ?

Here are my script and plot...

y <- estimateCommonDisp(y, design)
y <- estimateGLMCommonDisp(y, design, verbose=TRUE)
y <- estimateGLMTrendedDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
plotBCV(y) 

In the future, if you have a new question, please post a new question instead of posting it as an answer to another question. But to answer, that error message is pretty clearly telling you that you haven't installed the locfit package and should do so in order to use estimateDisp.