Using multiple tools for differential gene expression analysis as validation
Entering edit mode
Last seen 16 days ago
Antwerp, Belgium


I'm working on a transcriptome sequencing dataset (not RNA-seq but QuantSeq libraries, but that's not important for this question) and I am (among other things) interested in differential gene expression analysis. I already use edgeR and DESeq2 and would add limma soon. Now I wonder whether I can use these tools to validate the identified genes?

Of course I need wet lab validation to some extent (qPCR for some interesting genes will be started next week), but does using the overlap of the three methods give me a higher confidence of the identified genes, or are those tools essentially testing the same thing (and as such not adding evidence).

I assume the statistical tests are not identical, but I expect some parallel ideas exist? My understanding of the statistical background is insufficient to make this judgment.

Thanks in advance for your opinion!

edger limma deseq2 rnaseq • 1.8k views
Entering edit mode
Last seen 23 months ago
Scripps Research, La Jolla, CA

Ok, first, a quick summary of the available tests: edgeR and DESeq2 both offer a likelihood ratio test based on a negative binomial generalized linear model. DESeq2 also offers a Wald test, while edgeR also offers the quasi-likelihood F-test. limma offers ordinary linear model t-tests and F-tests. All of them use empirical Bayes methods to squeeze dispersion/variance estimates toward the mean of the dataset. (edgeR and DESeq 1 also offer some older tests, but I believe they're effectively deprecated now, and I can't think of a reason you would choose them over the ones mentioned above.)

Second, let's look at the theoretical differences between the tests: The edgeR and DESeq2 LRTs will not give identical results because they use slightly different methods to estimate the dispersions. However, if you force both of them to use the same dispersions, they should give you identical results. I believe the Wald test and LRT are asymptotically equivalent, although experiments with few samples might not be close to the asymptotic behavior. In addition, DESeq2 shrinks the logFC values toward zero when using the Wald test, but not the LRT. I believe the logFC values and standard errors are generally shrunk by a similar amount, so this shrinkage doesn't have much of an effect on the p-values. The LRT & Wald test are generally more liberal than the QL test and limma, because they require estimating the dispersions in a separate step and then fitting the GLM assuming the dispersions are estimated perfectly, ignoring the estimation uncertainty.

In practice, I've found that both LRTs and DESeq2's Wald test give quite similar results to each other on the same test, while edgeR's quasi-likelihood F-test and limma's t/F-tests also give quite similar results to each other. You will usually get more genes with the LRT (and Wald), but their liberalness also causes them to sometimes give more false positives than the nominal FDR would imply (e.g. giving many significant genes on a null comparison). The QL test and limma, in contrast, are more conservative (fewer significant genes), but tend to maintain proper false positive control in most cases. In theory, limma's use of the normal distribution as an approximation breaks down at low counts, and anecdotally this seems to be the case. The deeper your sequencing is and the more samples you have, the better limma will be. limma will also run much more quickly than the others on large data sets due to the use of a linear model instead of a GLM. Lastly, limma can fit some experimental designs that cannot be fit by edgeR or DESeq2, specifically models with unequal group/sample variances (beyond the inherent differences in variance due to differences in abundance) and certain kinds of mixed models. However, this probably isn't relevant for your purposes since you seem to already have a design that works with all 3.

Anyway, to answer your question, DESeq2, edgeR and limma all give pretty similar results, though there are some cases when one is better or worse than the others. I think you would be better off choosing the most appropriate one for your experiment rather than trying to average or overlap their results. My general advice is, for lots of well-sequenced samples, use limma and enjoy the speed. For few samples and/or low sequencing depth, use DESeq2 or edgeR. My personal preference is to use the QL test rather than the LRT or Wald tests because I usually prioritize strict false positive control over a larger gene list, but your priorities may be different. In any case, the differences between methods are minor compared to the importance of using exploratory data analysis to ensure you are fitting the model that best describes your data.

However, I often find it useful to run more than one tool simply for the diagnostics that they produce. For example, running edgeR or DESeq2 to view the BCV/dispersion plots even though I'm using limma, or running limma's voomWithQualityWeights to see if it suggests any outlier samples even though I plan to use edgeR.

(To any of the package authors: This is all from memory. Feel free to correct me on any details I got wrong.)

Entering edit mode

One small note on the Negative Binomial methods and "unequal group variances": groups with different expected count will of course have different within-groupĀ variances according to the NB model, it's the dispersion (~square of the coefficient of variation) that is fixed for a given gene. I know this is what you meant, but it sometimes confuses people who are reading about the NB models for the first time.

Entering edit mode

You're right, I'll edit my post.

Entering edit mode

Great elaborate answer, thanks a lot for the explanation and your opinion!


Login before adding your answer.

Traffic: 193 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6