Confidence intervals on edgeR logFC
2
4
Entering edit mode
ri.lars ▴ 40
@rilars-6776
Last seen 2.6 years ago

Is it possible to calculate a confidence interval on the log fold change reported in the output of edgeR?

I suppose that knowing the number of reads in a gene and the tagwise dispersion one can find the total CV^2 for that gene, as 1/reads + dispersion, but where to go from there?

As an alternative would it be possible estimate the confidence interval by bootstrapping. Simply resampling from the samples, fitting the model and calculating logFC a few thousand times?

edgeR statistics confidence interval • 5.5k views
15
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

We have no plans to offer confidence intervals for logFCs from edgeR.

There is actually a standard formula for the standard errors for the coefficients in any generalized linear model and, in principle, this could easily be computed in edgeR. We don't plan to do this however for two reasons:

1. Standard errors for count-based glms (binomial, Poisson, negative binomial) are notorious unreliable. We showed in the earliest edgeR paper (Robinson and Smyth, Biostatistics, 2008) that the t-tests obtained by dividing coefficients by their SEs are not even consistent (they do not necessarily become more significant as the effect size increases). We don't wish to include misleading information in the edgeR output. Likelihood ratio tests and exact tests are far, far more reliable.

2. The logFCs returned by edgeR are actually shrunk towards zero and have smaller SEs than ordinary glm coefficients would have. How to compute SEs for shrunk logFCs is not clear.

To compute bootstrap intervals, you would need a reasonably large number of samples. There is also a theoretical problem arising with the fact that the different RNA samples are not exchangeable in that they are associated with different sequence depths.

If you really do need confidence intervals for some reason, then I would suggest that you use voom and limma instead of edgeR. SEs are reliable for Gaussian models, so limma is able to return reliable confidence intervals for coefficients. Just set confint=TRUE when running topTable().

0
Entering edit mode

Thank you so much for this answer! I am building a random effects model to compare log fold change across several studies. I had used DESEq2 initially since they provide a log fold change SE and have been previously documented to be used with microbiome data (which is what I am working with).

However, in another set of data I used voom + limma since I had a paired design for the samples. Is there any way we could extract the SE from the confidence intervals in a voom + limma model? By extension, does it also imply that the log fold change standard error (lfcse) reported by DESEq2 are not stable since they follow an underlying negative binomial distribution?

Could you kindly elaborate why the logFC are shrunk towards 0 in edgeR / limma even for the non zero counts? I noticed this as my log fold changes in DESEq2 are somewhat higher in DESeq2 than in edgeR/ voom + limma for the same features. Why is that the case?

Thanks a lot for your time,

Manasi (PhD Epidemiology Candidate, UT SPH)

0
Entering edit mode
nwvidal • 0
@nwvidal-10055
Last seen 5.3 years ago

I would like to show "trends" of gene expression for individual genes of particular interest along a time series (3 points) using edgeR logcpm values. I have three samples in every condition and would like to show confidence intervals in the graph. (This kind of graph is usually seen using RPKM or FPKM values)

Does the same rationale of edgeR logFC applies for edgeR logcpm values?
I mean, I cannot use the standard formula (as for normal or t distribution) to calculate confidence intervals for logcpm. Is there a reliable way to calculate confidence intervals for logcpm values with 3 replicates?

0
Entering edit mode

It's more usual to plot standard error bars rather than confidence intervals on a log-CPM or logRPKM plot. With only three points, many journals will (very sensibly) ask you to plot all the individual values rather than showing mean+-SE.

You can use the usual undergraduate formula for the standard error of the mean to make a plot, but keep in mind that this simple standard error will tend to be more conservative than the inference done by edgeR, i.e., the differences between times will usually appear to be less significant than they actually are. You can't represent edgeR weighted empirical Bayes inference as a standard error.