Search
Question: EdgeR: Prior degrees of freedom effect on tag wise dispersion
1
6 months ago by
Mr.RB10
Mr.RB10 wrote:

Background

​I'm currently trying to understand how RNA-seq analysis should be performed. I have read all manuals (+ the corresponding papers), however as I'm mostly concerned with bio-informatics regarding data cleaning and integration I find it hard to understand the statistics behind e.g. edgeR (If there are any recommendation (papers e.g.) to better understand these feel free to recommend some).

Why this question?

​As the prior degrees of freedom is a constant, and thus not automatically determined based on the data I was interested in its effects on the dispersion estimate. I read here that the prior is commonly calculated as:

50/(#samples−#groups)

When I do this for my data: 50/(4-2) = 25.

As this is substantially higher than the default (=10) I decided to see the effects of the prior df on my data, for which I used the following code:

library(edgeR)

groups <- as.factor(c("group1","group1","group2","group2"))
dg.list <- DGEList(counts=counts, group = groups)

# normalization
dg.list <- calcNormFactors(dg.list)

# dispersion estimates
dg.list <- estimateGLMCommonDisp(dg.list)
dg.list <- estimateGLMTrendedDisp(dg.list)
​
# effect of prior in tag wise dispersion estimates
design <- model.matrix(~0+group, data=dg.list$samples) colnames(design) <- levels(dg.list$samples\$group)

par(mfrow = c(3,2))
for (prior in c(10,15,20,25,30,35)) {
dg.list <- estimateGLMTagwiseDisp(dg.list, design, prior.df = prior)
fit <- glmFit(dg.list ,design)
gof(fit, plot = T, main = paste0('prior: ',prior))
}

The produced plot

​The question

As can be seen in this plot the genes in the top right are under the line with the default (=10) but above the line with a prior df of 25. Of course I could just pick the prior df best fitting the line but I assume this would not be the best practice. Thus my question is: is there a way to estimate the 'best' prior df or should I just stick with the defaults?

modified 6 months ago by Aaron Lun21k • written 6 months ago by Mr.RB10
2
6 months ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

There is no longer any need to set the prior d.f. manually. estimateDisp will automatically estimate a suitable prior d.f. for you, and has been doing so for several years. We are also recommending the QL framework:

A: edgeR glmLRT with test="F"?

https://f1000research.com/articles/5-1438/v2

... in which the prior d.f. for empirical Bayes shrinkage of the QL dispersions is also automatically estimated. There has also been work performed in making the prior d.f. estimation more robust to outliers, see:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5373812/

Dear Aaron, thankyou for your reply! I did not know about the estimateDisp wrapper for the common, trend and tagwise dispersion estimates.

estimateDisp() and the associated publications are introduced in the first chapter of the edgeR User's Guide. estimateDisp() is the first piece of code introduced in Section 1.4 "Quick Start".

Do you have any recommendation to papers/websites/videos to understand why you should calculate these dispersions, and how these are used in the GLM fitting?