Question

Why limma, edgeR, and DESeq did not implement fold change moderation first?

2

Entering edit mode

Nik Tuzov ▴ 80

@nik-tuzov-8783

Last seen 3 months ago

United States

Hello:

I have a rather general question about how and why Bayesian inference have been applied in limma, edgeR, and DESeq1-2.

In the simplest differential expression study there are two regression coefficients, an intercept and slope, the latter also denoted by log fold change. LFC estimates obtained across the features (genes) are quite noisy and are likely to benefit from shrinkage. In early 2000s there were enough general purpose shrinkage methods, such as Ridge Regression, that could help with that directly or at least provide some methodological clues.

However, LFC shrinkage was more or less ignored in limma, edgeR, and DESeq between 2004 and 2013 until it was added to DESeq2 in 2014. DESeq2 paper (Love et al, 2014) mentions three other Bayesian papers that moderate LFC, but they are all dated 2012-2013. Between 2004 and 2012, shrinkage was applied only to the 2nd order (variance) parameters, and my question is why.

The first version of limma (Smyth, 2004) provides an answer but it's very partial. In that paper they proposed a two component prior for LFC: LFC is equal to zero with probability p > 0 and is N(0, sigma) otherwise. The posterior odds statistic shrinks both LFC and 2nd order parameters. It then turned out that there are technical issues with estimating p and sigma from the data, so what we end up using in practice is the “moderated t-statistic” that doesn't actually moderate LFC.

What happened next was rather puzzling. Instead of using a two component model, in 2014 Love et al proposed a straightforward two stage procedure: first, variance parameters get moderated. Second, assuming the variance parameters are known, apply something quite similar to Ridge regression where the prior for LFC is N(0, sigma), without the zero component. Apparently it worked fine in practice, so my question is why something like that was never considered before, especially in limma that is based on a simple linear model.

I can speculate that the two component prior for LFC seemed far more adequate to Smyth than N(0, sigma) prior, so, after the former didn't work out he decided not to pursue LFC moderation with the latter. Another possibility is that DESeq2 might have some estimation issues similar to that of posterior odds in Smith, 2004. Please let me know what you think.

limma edger deseq deseq2 • 4.2k views

ADD COMMENT • link updated 6.6 years ago by Gordon Smyth 50k • written 6.6 years ago by Nik Tuzov ▴ 80

1

Entering edit mode

This isn't so much an answer, but a comment. While I of course think a Bayesian posterior LFC is a useful statistic, it's not straightforward to implement. Figuring out an algorithm to output a reasonable scale of the prior is not trivial. The shrinkage estimator for DESeq2 is always biased toward 0, by definition, but for some datasets, there is too much bias, and we're working on fixing this. For example, the RNA-seq mixology paper and dataset showed too much bias for DESeq2. It's definitely ongoing work, and we should have some new shrinkage methods in place that can be used via lfcShrink (it looks like not in this devel cycle, but the next one because I want to have enough time for testing before it's user facing).

ADD REPLY • link 6.6 years ago Michael Love 41k

1

Entering edit mode

Another comment, we do reference, in DESeq2 Methods section, microarray methods which proposed moderated fold changes, e.g. Newton et al 2001. This section starts with: "As was observed with differential expression analysis using microarrays, genes with low intensity values tend to suffer from a small signal-to-noise ratio..."

ADD REPLY • link 6.6 years ago Michael Love 41k

0

Entering edit mode

Hello Michael:

Given what Ryan said below, does lfcShrink follow the suggestion I posted here:

C: Interactions in DESeq2

ADD REPLY • link 6.6 years ago Nik Tuzov ▴ 80

1

Entering edit mode

No, we won't be applying priors to contrasts of coefficients, just to coefficients (either all non-intercept or individually).

ADD REPLY • link 6.6 years ago Michael Love 41k

score 3 · Answer 1 · 2017-09-25

Would you mind spelling my name correctly? And who exactly is "Smith et al". Is this me again?

My 2004 paper gave an exact mathematical justification of why shrinking the fold changes does not help with differential expression performance. The posterior odds of DE implies moderation of the variances but not of the means. Furthermore, the moderated posterior odds is shown to be a monotonic function of the moderated t-statistic, meaning that the t-statistic contains full information. The justification is complete, not partial, and doesn't have to do with the estimation of pi0.

Many of the most popular microarray normalization and background correction methods (luma-vst and affy-rma especially), impose a considerable amount of fold change shrinkage as part of the preprocessing. I have argued that in some cases the shrinkage is already over-done at the normalization stage ( https://www.ncbi.nlm.nih.gov/pubmed/20929874 ). In other cases, shrinkage is underdone but remedial action is better taken at the background correction step rather than delaying it to the DE analysis. This is why limma put so much effort into backgroundCorrection early on ( https://www.ncbi.nlm.nih.gov/pubmed/17720982 ).

limma has had an explicit fold change shrinkage function predFCm() since January 2013 but, for the above reasons, is not a routine part of the DE pipeline. One of my students, Belinda Phipson, developed formal empirical Bayes methods for fold change shrinkage for both microarrays and RNA-seq back in 2010.

All limma and edgeR analyses of RNA-seq implement fold-change shrinkage to the extent that we think it worthwhile and they've always done that.

Fold change shrinkage was not overlooked nor was it too difficult. limma and edgeR do what they do because I thought that was best, not because other possibilities were not considered or could not be done. In particular, I believed it was undesirable to overdo the shrinkage because that would give a misleading impression to biologists.

I have to add a personal comment in view of your speculations about my thinking. You do not have the power of mind reading and your representations of the technical material are not always correct. You are welcome to ask honest questions, or to comment on the content of published papers or documentation, but making speculations and assertions about what other people are or were thinking is not appropriate. Getting told that I did this or that because something "didn't work out" really is a bit much.

One day later:

The purpose of this website to help people who are using Bioconductor software to analyse genomic data, in particular to help with syntax problems and usage. I give my limited time to give pointers on how to use the packages that I've authored and to respond to problems and bug reports. As time allows, I also give some advice to genomic researchers on specific analyses.

You are not a Bioconductor user nor a genomic researcher. You are rather a developer for a commercial company that promotes itself in competition with Bioconductor. Partek, the company you work for, has no history of cooperating with or using Bioconductor in any way. The software it produces is closed source and, as far as I have seen, none of the statistical methods are documented in detail.

I don't feel that my obligations to Bioconductor (or those of other bioc developers) extend to having to give detailed tutorials on statistical methods to commercial companies. I already publish and explain my methodologies in great detail in public forums. If Partek would like more background information or a more personalised service, it would be normal to pay for the service as a consultation.

Are your many questions a sign that Partek is planning to implement some more modern statistical methods? So far its statistical methods for genomic analyses seem rather traditional to say the least. Will you give detailed justifications for Partek's statistical methods and historical decisions in the way that you are asking from us?

score 1 · Answer 2 · 2017-09-25

1

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 8 months ago

Scripps Research, La Jolla, CA

From my perspective, one big issue with imposing a prior on log fold changes is that your model becomes dependent on your choice of parametrization. Alternate paratrizations of the same design are no longer equivalent. Effectively, in addition to specifying the model space, you now also need to specify the axes in that space along which shrinkage will be performed. People already have enough trouble coming up with the right design without having to worry about whether they chose the right way to parametrize that design to get appropriate shrinkage. And on top of that, you have the issue of determining the appropriate magnitude of shrinkage (i.e. the scale/strength of the prior), as described in Mike's comment.

It's possible that the recent approach taken by DESeq2 with a separate lfcShrink step after choosing the test to perform mostly solved the issue by only shrinking the coefficients/contrasts being tested, but I haven't had a chance to look at it in detail, so I couldn't say for sure. Regardless, the general point is that incorporating LFC shrinkage requires adding a lot of complexity to the model, and in particular a lot of complexity that can't yet be easily hidden from the user because we don't yet know what the good, safe defaults are.

Anyway, this isn't a complete answer, just some of the issues that I've seen.

ADD COMMENT • link 6.6 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Is this issue present even when the design is orthogonal?

ADD REPLY • link 6.6 years ago Nik Tuzov ▴ 80

0

Entering edit mode

I think it is. If your design has 3 groups, A, B, and C, then a typical design matrix will have coefficients representing A, B-A, and C-A. In that case, if you want C-B, you need to make a contrast between the latter two coefs. I don't think that shrinking the B-A and C-A coefs will result in equivalent shrinkage for the C-B contrast. Hence my statement that when shrinkage is involved, the choice of parametrization becomes a meaningful part of the model rather than just a matter of convenience.

ADD REPLY • link 6.6 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

There are ways to define the prior so that it is not parametrization dependent. For example, the shrinkage implemented in the edgeR package (predFC is the low-level function) is parametrization invariant. Equivalent contrasts get the same shrunk logFC regardless of how the design matrix is parametrized.

ADD REPLY • link 6.6 years ago Gordon Smyth 50k