Question

Confidence intervals on edgeR logFC

4

Entering edit mode

ri.lars ▴ 40

@rilars-6776

Last seen 6.4 years ago

Is it possible to calculate a confidence interval on the log fold change reported in the output of edgeR?

I suppose that knowing the number of reads in a gene and the tagwise dispersion one can find the total CV^2 for that gene, as 1/reads + dispersion, but where to go from there?

As an alternative would it be possible estimate the confidence interval by bootstrapping. Simply resampling from the samples, fitting the model and calculating logFC a few thousand times?

edgeR statistics confidence interval • 9.2k views

ADD COMMENT • link updated 9.0 years ago by nwvidal • 0 • written 10.6 years ago by ri.lars ▴ 40

0

Entering edit mode

nwvidal • 0

@nwvidal-10055

Last seen 9.0 years ago

I would like to show "trends" of gene expression for individual genes of particular interest along a time series (3 points) using edgeR logcpm values. I have three samples in every condition and would like to show confidence intervals in the graph. (This kind of graph is usually seen using RPKM or FPKM values)

Does the same rationale of edgeR logFC applies for edgeR logcpm values?
I mean, I cannot use the standard formula (as for normal or t distribution) to calculate confidence intervals for logcpm. Is there a reliable way to calculate confidence intervals for logcpm values with 3 replicates?

ADD COMMENT • link 9.0 years ago nwvidal • 0

0

Entering edit mode

It's more usual to plot standard error bars rather than confidence intervals on a log-CPM or logRPKM plot. With only three points, many journals will (very sensibly) ask you to plot all the individual values rather than showing mean+-SE.

You can use the usual undergraduate formula for the standard error of the mean to make a plot, but keep in mind that this simple standard error will tend to be more conservative than the inference done by edgeR, i.e., the differences between times will usually appear to be less significant than they actually are. You can't represent edgeR weighted empirical Bayes inference as a standard error.

ADD REPLY • link 9.0 years ago Gordon Smyth 52k

score 15 · Accepted Answer · 2014-09-24

We have no plans to offer confidence intervals for logFCs from edgeR.

There is actually a standard formula for the standard errors for the coefficients in any generalized linear model and, in principle, this could easily be computed in edgeR. We don't plan to do this however for two reasons:

1. Standard errors for count-based glms (binomial, Poisson, negative binomial) are notorious unreliable. We showed in the earliest edgeR paper (Robinson and Smyth, Biostatistics, 2008) that the t-tests obtained by dividing coefficients by their SEs are not even consistent (they do not necessarily become more significant as the effect size increases). We don't wish to include misleading information in the edgeR output. Likelihood ratio tests and exact tests are far, far more reliable.

2. The logFCs returned by edgeR are actually shrunk towards zero and have smaller SEs than ordinary glm coefficients would have. How to compute SEs for shrunk logFCs is not clear.

To compute bootstrap intervals, you would need a reasonably large number of samples. There is also a theoretical problem arising with the fact that the different RNA samples are not exchangeable in that they are associated with different sequence depths.

If you really do need confidence intervals for some reason, then I would suggest that you use voom and limma instead of edgeR. SEs are reliable for Gaussian models, so limma is able to return reliable confidence intervals for coefficients. Just set confint=TRUE when running topTable().