Search
Question: edger, trended or common dispersion
0
gravatar for tonja.r
23 months ago by
tonja.r30
United Kingdom
tonja.r30 wrote:

I am  bit in doubts if I should use estimateGLMCommonDisp(y,design) or estimateGLMTrendedDisp(y,design). There is however a difference in DE genes, not a big one but still. How should the ideal plots look like?
My BCV plots are following:

Trended dispersion estimated:

and only common dispersion:

ADD COMMENTlink modified 16 months ago by David ROUX0 • written 23 months ago by tonja.r30
3
gravatar for Ryan C. Thompson
23 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.1k wrote:

It seems pretty clear that there is a significant trend in your dispersions, so I would recommend using trended dispersion. However, you generally should just use estimateDisp instead of estimateGLMCommonDisp or estimateGLMTrendedDisp.

ADD COMMENTlink written 23 months ago by Ryan C. Thompson6.1k

Why should I use estimateDisp? In manual it is stated that I can alternatively use another functions:

y <- estimateDisp(y, design)

Alternatively, one can use the following calling sequence to estimate them one by one. To estimate common dispersion:

y <- estimateGLMCommonDisp(y, design)

To estimate trended dispersions:

y <- estimateGLMTrendedDisp(y, design)

To estimate tagwise dispersions:

y <- estimateGLMTagwiseDisp(y, design)
ADD REPLYlink written 23 months ago by tonja.r30
3

Because estimateDisp provides protection against zero fitted values, gives a more stable likelihood landscape (due to hot-starting at every step of the grid search), gives a more graduated trend from local fitting compared to binning, and can estimate the necessary prior degrees of freedom for EB shrinkage from the data. Oh, and it's under active development. I think that we're intending for estimateDisp to subsume the functionality of the other estimation functions... eventually.

ADD REPLYlink modified 23 months ago • written 23 months ago by Aaron Lun17k

If I use estimateDisp on another dataset instead of those three commands, I get following BCV plot:

and if I use those three commands instead, I get:


So, the first pic produced by estimateDisp seemed a bit strange to me, so I decided to avoid this function and use the three command instead. What do you think?

ADD REPLYlink modified 23 months ago • written 23 months ago by tonja.r30
1

There's no problem. As Aaron explained, the newer pipeline uses the data to estimate the prior degrees of freedom for the tagwise dispersions. The older pipeline simply uses 10 prior df unconditionally unless you tell it to use some other value. If you look at your DGEList object after running estimateDisp, it probably has a very large or even infinite prior df, indicating that there is insufficient evidence of gene-specific dispersions in your dataset, so that each gene simply uses the trend instead of getting its own dispersion. This is likely to happen if you have very few samples and/or low sequencing depth. In either case, the estimateDisp result is more honest about what information you do and do not have about each gene's dispersion.

ADD REPLYlink modified 23 months ago • written 23 months ago by Ryan C. Thompson6.1k

But what does it mean regarding the DE genes then if my BCV looks like the first picture? 

ADD REPLYlink written 23 months ago by tonja.r30

I'm not sure what you're asking. In general, higher prior d.f. means that you are able to share more information across genes. This improves the precision of the dispersion estimates and increases power to detect DE genes. There's no cause for alarm based on this plot; the mean-dispersion trend has a nice decreasing shape and the values on the y-axis are fairly low. The fact that every dispersion is shrunk to the trend just means that the genes in your data have dispersions that don't vary much from the trend. The only extra thing I can think of is to set robust=TRUE, to protect against overzealous shrinkage of outlier dispersions.

ADD REPLYlink modified 23 months ago • written 23 months ago by Aaron Lun17k

Just a small question regarding those functions. The RUVseq manual uses following:
  y <- estimateGLMCommonDisp(y, design)
  y <- estimateGLMTagwiseDisp(y, design)
Also, DiffBind does not use estimateDisp function in the edgeR analysis. 
What is the motivation that they use the functions separately then?

ADD REPLYlink written 23 months ago by tonja.r30

The code for those packages was likely written before the estimateDisp function was introduced into edgeR.

ADD REPLYlink written 23 months ago by Ryan C. Thompson6.1k

Would it be advisable to use glmQLFit over glmFit as I have only two replicates per condition?

ADD REPLYlink written 23 months ago by tonja.r30

That's fine. The whole point of information sharing via EB shrinkage (of the QL or NB dispersions) is to overcome problems due to low numbers of replicates. Obviously, though, the more replicates the better.

ADD REPLYlink written 23 months ago by Aaron Lun17k

As far as I understood the main difference between  QL and NB dispersions is that QL accounts for gene-specific variability, meaning the more replicates the better. If I have only two replicates, there is no point to use QL, right? Or when one should use QL and when NB?

ADD REPLYlink written 23 months ago by tonja.r30
2

EB shrinkage of both QL and NB dispersions can account for gene-specific variability. However, the QL approach accounts for the uncertainty of the estimates in a more statistically rigorous manner. This ensures that type I error control is maintained during testing. In contrast, shrinkage of the NB dispersions is a bit less rigorous; the NB distribution doesn't have a conjugate prior for the dispersion parameter, and the uncertainty of dispersion estimation is difficult to model.

Besides, you can still have gene-specific variability with two replicates. If one gene is variable between the two replicates, and another is not variable, then - behold - you have gene-specific variability.

ADD REPLYlink modified 23 months ago • written 23 months ago by Aaron Lun17k
0
gravatar for David ROUX
16 months ago by
David ROUX0
France (Avignon University)
David ROUX0 wrote:

In my case, if I use "estimateDisp" instead of the other functions I get  :

"Error in locfitByCol(loglik, covariate, span = span, degree = 0) : 
  locfit required but is not available"

And, I still do not realy know how to interpret the plot.

I really like the funny sentence of Aaron : "a nice decreasing shape " :-) 

But what to expect objectively ?

Here are my script and plot...

 

y <- estimateCommonDisp(y, design)
y <- estimateGLMCommonDisp(y, design, verbose=TRUE)
y <- estimateGLMTrendedDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
plotBCV(y) 

 

 

ADD COMMENTlink written 16 months ago by David ROUX0

In the future, if you have a new question, please post a new question instead of posting it as an answer to another question. But to answer, that error message is pretty clearly telling you that you haven't installed the locfit package and should do so in order to use estimateDisp.

ADD REPLYlink written 16 months ago by Ryan C. Thompson6.1k

Ok, thanks for your response. David.

ADD REPLYlink modified 16 months ago • written 16 months ago by David ROUX0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 158 users visited in the last hour