Search
Question: EdgeR: Prior degrees of freedom effect on tag wise dispersion
1
gravatar for r.beeloo
5 weeks ago by
r.beeloo10
r.beeloo10 wrote:

 

Background

​I'm currently trying to understand how RNA-seq analysis should be performed. I have read all manuals (+ the corresponding papers), however as I'm mostly concerned with bio-informatics regarding data cleaning and integration I find it hard to understand the statistics behind e.g. edgeR (If there are any recommendation (papers e.g.) to better understand these feel free to recommend some).

Why this question?

​As the prior degrees of freedom is a constant, and thus not automatically determined based on the data I was interested in its effects on the dispersion estimate. I read here that the prior is commonly calculated as: 

50/(#samples−#groups)

When I do this for my data: 50/(4-2) = 25.

As this is substantially higher than the default (=10) I decided to see the effects of the prior df on my data, for which I used the following code:

library(edgeR)

# loading data
groups <- as.factor(c("group1","group1","group2","group2"))
dg.list <- DGEList(counts=counts, group = groups)

# normalization
dg.list <- calcNormFactors(dg.list)

# dispersion estimates
dg.list <- estimateGLMCommonDisp(dg.list)
dg.list <- estimateGLMTrendedDisp(dg.list)
​
# effect of prior in tag wise dispersion estimates
design <- model.matrix(~0+group, data=dg.list$samples)
colnames(design) <- levels(dg.list$samples$group)

par(mfrow = c(3,2))
for (prior in c(10,15,20,25,30,35)) {
  dg.list <- estimateGLMTagwiseDisp(dg.list, design, prior.df = prior)
  fit <- glmFit(dg.list ,design)
  gof(fit, plot = T, main = paste0('prior: ',prior))
}

The produced plot

​The question

As can be seen in this plot the genes in the top right are under the line with the default (=10) but above the line with a prior df of 25. Of course I could just pick the prior df best fitting the line but I assume this would not be the best practice. Thus my question is: is there a way to estimate the 'best' prior df or should I just stick with the defaults?

 

 

ADD COMMENTlink modified 5 weeks ago by Aaron Lun19k • written 5 weeks ago by r.beeloo10
2
gravatar for Aaron Lun
5 weeks ago by
Aaron Lun19k
Cambridge, United Kingdom
Aaron Lun19k wrote:

There is no longer any need to set the prior d.f. manually. estimateDisp will automatically estimate a suitable prior d.f. for you, and has been doing so for several years. We are also recommending the QL framework:

A: edgeR glmLRT with test="F"?

https://f1000research.com/articles/5-1438/v2

... in which the prior d.f. for empirical Bayes shrinkage of the QL dispersions is also automatically estimated. There has also been work performed in making the prior d.f. estimation more robust to outliers, see:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5373812/

ADD COMMENTlink written 5 weeks ago by Aaron Lun19k

Dear Aaron, thankyou for your reply! I did not know about the estimateDisp wrapper for the common, trend and tagwise dispersion estimates. 

 

ADD REPLYlink written 5 weeks ago by r.beeloo10

estimateDisp() and the associated publications are introduced in the first chapter of the edgeR User's Guide. estimateDisp() is the first piece of code introduced in Section 1.4 "Quick Start".

ADD REPLYlink written 4 weeks ago by Gordon Smyth33k

Do you have any recommendation to papers/websites/videos to understand why you should calculate these dispersions, and how these are used in the GLM fitting? 

ADD REPLYlink written 5 weeks ago by r.beeloo10

Section 1.2 of the edgeR user's guide contains brief summaries of the relevant papers.

ADD REPLYlink written 5 weeks ago by Aaron Lun19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 147 users visited in the last hour