Why limma, edgeR, and DESeq did not implement fold change moderation first?
2
2
Entering edit mode
Nik Tuzov ▴ 80
@nik-tuzov-8783
Last seen 7 months ago
United States

Hello:

I have a rather general question about how and why Bayesian inference have been applied in limma, edgeR, and DESeq1-2.

In the simplest differential expression study there are two regression coefficients, an intercept and slope, the latter also denoted by log fold change. LFC estimates obtained across the features (genes) are quite noisy and are likely to benefit from shrinkage. In early 2000s there were enough general purpose shrinkage methods, such as Ridge Regression, that could help with that directly or at least provide some methodological clues.

However, LFC shrinkage was more or less ignored in limma, edgeR, and DESeq between 2004 and 2013 until it was added to DESeq2 in 2014. DESeq2 paper (Love et al, 2014) mentions three other Bayesian papers that moderate LFC, but they are all dated 2012-2013. Between 2004 and 2012, shrinkage was applied only to the 2nd order (variance) parameters, and my question is why.

The first version of limma (Smyth, 2004) provides an answer but it's very partial. In that paper they proposed a two component prior for LFC: LFC is equal to zero with probability p > 0 and is N(0, sigma) otherwise. The posterior odds statistic shrinks both LFC and 2nd order parameters. It then turned out that there are technical issues with estimating p and sigma from the data, so what we end up using in practice is the “moderated t-statistic” that doesn't actually moderate LFC.

What happened next was rather puzzling. Instead of using a two component model, in 2014 Love et al proposed a straightforward two stage procedure: first, variance parameters get moderated. Second, assuming the variance parameters are known, apply something quite similar to Ridge regression where the prior for LFC is N(0, sigma), without the zero component. Apparently it worked fine in practice, so my question is why something like that was never considered before, especially in limma that is based on a simple linear model.

I can speculate that the two component prior for LFC seemed far more adequate to Smyth than N(0, sigma) prior, so, after the former didn't work out he decided not to pursue LFC moderation with the latter. Another possibility is that DESeq2 might have some estimation issues similar to that of posterior odds in Smith, 2004. Please let me know what you think.

limma edger deseq deseq2 • 4.6k views
1
Entering edit mode

This isn't so much an answer, but a comment. While I of course think a Bayesian posterior LFC is a useful statistic, it's not straightforward to implement. Figuring out an algorithm to output a reasonable scale of the prior is not trivial. The shrinkage estimator for DESeq2 is always biased toward 0, by definition, but for some datasets, there is too much bias, and we're working on fixing this. For example, the RNA-seq mixology paper and dataset showed too much bias for DESeq2. It's definitely ongoing work, and we should have some new shrinkage methods in place that can be used via lfcShrink (it looks like not in this devel cycle, but the next one because I want to have enough time for testing before it's user facing).

1
Entering edit mode

Another comment, we do reference, in DESeq2 Methods section, microarray methods which proposed moderated fold changes, e.g. Newton et al 2001. This section starts with: "As was observed with differential expression analysis using microarrays, genes with low intensity values tend to suffer from a small signal-to-noise ratio..."

0
Entering edit mode

Hello Michael:

Given what Ryan said below, does lfcShrink follow the suggestion I posted here:

C: Interactions in DESeq2

1
Entering edit mode

No, we won't be applying priors to contrasts of coefficients, just to coefficients (either all non-intercept or individually).

3
Entering edit mode
@gordon-smyth
Last seen 5 hours ago
WEHI, Melbourne, Australia

Would you mind spelling my name correctly? And who exactly is "Smith et al". Is this me again?

My 2004 paper gave an exact mathematical justification of why shrinking the fold changes does not help with differential expression performance. The posterior odds of DE implies moderation of the variances but not of the means. Furthermore, the moderated posterior odds is shown to be a monotonic function of the moderated t-statistic, meaning that the t-statistic contains full information. The justification is complete, not partial, and doesn't have to do with the estimation of pi0.

Many of the most popular microarray normalization and background correction methods (luma-vst and affy-rma especially), impose a considerable amount of fold change shrinkage as part of the preprocessing. I have argued that in some cases the shrinkage is already over-done at the normalization stage ( https://www.ncbi.nlm.nih.gov/pubmed/20929874 ). In other cases, shrinkage is underdone but remedial action is better taken at the background correction step rather than delaying it to the DE analysis. This is why limma put so much effort into backgroundCorrection early on ( https://www.ncbi.nlm.nih.gov/pubmed/17720982 ).

limma has had an explicit fold change shrinkage function predFCm() since January 2013 but, for the above reasons, is not a routine part of the DE pipeline. One of my students, Belinda Phipson, developed formal empirical Bayes methods for fold change shrinkage for both microarrays and RNA-seq back in 2010.

All limma and edgeR analyses of RNA-seq implement fold-change shrinkage to the extent that we think it worthwhile and they've always done that.

Fold change shrinkage was not overlooked nor was it too difficult. limma and edgeR do what they do because I thought that was best, not because other possibilities were not considered or could not be done. In particular, I believed it was undesirable to overdo the shrinkage because that would give a misleading impression to biologists.

I have to add a personal comment in view of your speculations about my thinking. You do not have the power of mind reading and your representations of the technical material are not always correct. You are welcome to ask honest questions, or to comment on the content of published papers or documentation, but making speculations and assertions about what other people are or were thinking is not appropriate. Getting told that I did this or that because something "didn't work out" really is a bit much.

One day later:

The purpose of this website to help people who are using Bioconductor software to analyse genomic data, in particular to help with syntax problems and usage. I give my limited time to give pointers on how to use the packages that I've authored and to respond to problems and bug reports. As time allows, I also give some advice to genomic researchers on specific analyses.

You are not a Bioconductor user nor a genomic researcher. You are rather a developer for a commercial company that promotes itself in competition with Bioconductor. Partek, the company you work for, has no history of cooperating with or using Bioconductor in any way. The software it produces is closed source and, as far as I have seen, none of the statistical methods are documented in detail.

I don't feel that my obligations to Bioconductor (or those of other bioc developers) extend to having to give detailed tutorials on statistical methods to commercial companies. I already publish and explain my methodologies in great detail in public forums. If Partek would like more background information or a more personalised service, it would be normal to pay for the service as a consultation.

Are your many questions a sign that Partek is planning to implement some more modern statistical methods? So far its statistical methods for genomic analyses seem rather traditional to say the least. Will you give detailed justifications for Partek's statistical methods and historical decisions in the way that you are asking from us?

0
Entering edit mode

I understood that the moderated t-stat resulted in the same ranking of genes as the posterior odds, but it's not clear why "The posterior odds of DE implies moderation of the variances but not of the means. " I am looking at this version of your paper:

http://www.statsci.org/smyth/pubs/ebayes.pdf

In Section 5, O_g_j is dependent on the parameters p_j and v_0_j that define the prior distribution of regression coefficient beta_g_j. Suppose you knew the true values of these parameters. Would you have no use for them anyway?

1
Entering edit mode

Even if you knew p_j and v_0_j, you could not improve on the ranking provided by the moderated t-statistic, if your aim is to rank by the probability of DE.

IMO you are conflating different aims that are in reality separate (assessment of DE vs estimation of fold changes). To have any useful discussion about fold change shrinkage, one needs to be precise about what one expects to achieve from the resulting estimates, and you haven't done that.

Anyway,  I believe your questions fall outside the purpose of this website, and I have edited my answer to make that clear.

0
Entering edit mode

I think in practice it's hard to separate these two. As far as I know, the user will need a list of genes with LFC estimates that are practically different from 0, and they have to be statistically significant (low p-value or high posterior odds ratio).

0
Entering edit mode

"In particular, I believed it was undesirable to overdo the shrinkage because that would give a misleading impression to biologists."

What exactly do you mean by "misleading impression"?

0
Entering edit mode

The reason I asked the question was personal curiosity. Recently I started to review the older shrinkage methods (ridge, lasso, etc) that focus on shrinking the regression coefficients that describe the mean of response. At the time I had an impression that what popular DE packages did it "backwards", i.e. they began with shrinking the variance parameters and only then switched to the regression coefficients. Thank you for clarifying this for me.

1
Entering edit mode
@ryan-c-thompson-5618
Last seen 12 months ago
Scripps Research, La Jolla, CA

From my perspective, one big issue with imposing a prior on log fold changes is that your model becomes dependent on your choice of parametrization. Alternate paratrizations of the same design are no longer equivalent. Effectively, in addition to specifying the model space, you now also need to specify the axes in that space along which shrinkage will be performed. People already have enough trouble coming up with the right design without having to worry about whether they chose the right way to parametrize that design to get appropriate shrinkage. And on top of that, you have the issue of determining the appropriate magnitude of shrinkage (i.e. the scale/strength of the prior), as described in Mike's comment.

It's possible that the recent approach taken by DESeq2 with a separate lfcShrink step after choosing the test to perform mostly solved the issue by only shrinking the coefficients/contrasts being tested, but I haven't had a chance to look at it in detail, so I couldn't say for sure. Regardless, the general point is that incorporating LFC shrinkage requires adding a lot of complexity to the model, and in particular a lot of complexity that can't yet be easily hidden from the user because we don't yet know what the good, safe defaults are.

Anyway, this isn't a complete answer, just some of the issues that I've seen.

0
Entering edit mode

Is this issue present even when the design is orthogonal?

0
Entering edit mode

I think it is. If your design has 3 groups, A, B, and C, then a typical design matrix will have coefficients representing A, B-A, and C-A. In that case, if you want C-B, you need to make a contrast between the latter two coefs. I don't think that shrinking the B-A and C-A coefs will result in equivalent shrinkage for the C-B contrast. Hence my statement that when shrinkage is involved, the choice of parametrization becomes a meaningful part of the model rather than just a matter of convenience.

0
Entering edit mode

There are ways to define the prior so that it is not parametrization dependent. For example, the shrinkage implemented in the edgeR package (predFC is the low-level function) is parametrization invariant. Equivalent contrasts get the same shrunk logFC regardless of how the design matrix is parametrized.