Question

Moderated F-test followed by moderated t-test in limma

0

Entering edit mode

jurgen.claesen • 0

@jurgenclaesen-24034

Last seen 3.9 years ago

Dear all,

I have a somewhat fundamental question about differential expression detection in limma. Assume, I want to analyze a micro-array experiment where gene-expression is measured for a limited set of genes (say 1000) in 8 different conditions. The main aim is to identify which of these genes are different for each pair-wise comparison (AvsB, AvsC, ..., AvsH, BvsC, et cetera).

In a "traditional" setting, one would apply first an F-test, and whenever this F-test has a pvalue below the significance level, one would start with the pair-wise comparisons. However, it seems that in limma, when using topTable() after eBayes(), that the moderated F-test and the moderated t-tests for the pair-wise comparisons are done at the same time, which means that the correction for multiple testing is done for all genes, regardless if the pvalue of the F-test is below the significance level, which could lead to having more false negatives.

Is it possible to mimick the "traditional" approach in limma, where the t-tests are done after selecting genes based on the F-test? Would this require to refit the model and hence apply the empirical Bayes approach on a smaller set of genes (which can lead to less precise estimation of the variances)?

Thank you, Jürgen

limma differential expression • 1.5k views

ADD COMMENT • link updated 3.9 years ago by Gordon Smyth 51k • written 3.9 years ago by jurgen.claesen • 0

score 3 · Answer 1 · 2020-08-22

Yes, a multiple gene version of the "traditional" F-test followed by t-tests approach (F-then-t) is implemented in limma in the "hierarchical" method of decideTests. No, it does not require any model refitting.

You should be aware though that the "traditional" F-then-t approach from early statistics textbooks doesn't have any theoretical advantages over more direct multiple testing approaches using t-tests alone. It does not generally improve statistical power even in the traditional univariate context and (in my experience) is only a minority method in the biomedical literature. The F-test is useful if the F-test null hypothesis is what you want to test but, if your ultimate intention is to control the error rate for the pairwise comparisons, then it doesn't help. The F-test also doesn't fit well with the newer concepts of FDR control because it is inherently controlling the FWER across the t-tests instead of the FDR.

There is no published theory for generalizing the F-then-t approach to multiple gene contexts in which multiple testing corrections have to be applied first to the F-tests and then to the t-tests (that's why the limma method is called "hierarchical"). Nor is there any theory for how to use F-tests for FDR control. For both of these reasons, the limma statistical method is novel and unpublished. You're welcome to use it, but I haven't found it to have any strong advantages so I don't recommend it or use it in the limma case studies.