Question: Help with Limma F.p.value from eBayes
0
10 weeks ago by
rs04120
rs04120 wrote:

I have fit this model on microarray data (n genes = 15,700) using the "mgenes" eset

design = model.matrix(~group + age + gender + medUse)


age is continuous gender and medUse are binary group is categorical with different mouse strains. The normal mouse strain is set as the first condition in the group. We use several samples from the same mouse, so have set that as a block

corfit = duplicateCorrelation(mgenes, design, block=factor(mgenes$mouse_id)) fit = lmFit(mgenes,design,block=mgenes$mouse_id,correlation=corfit$consensus) fit2 = eBayes(fit)  when I look at the F-statistics (fit2$F.p.value), the p-values are difficult to interpret: most are zeros (0.000000e+00) and the rest are very small (1.298202e-109 is the largest after 0)

My goal is to use the F-statistic p-value to reduce the number of genes that I test, to reduce the multiple testing burden, before looking at the moderated t-statistics and associated p-values.

Any help is greatly appreciated!

limma • 226 views
modified 10 weeks ago by Gordon Smyth37k • written 10 weeks ago by rs04120
Answer: Help with Limma F.p.value from eBayes
2
10 weeks ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

Use topTable to do your F-tests, don't try to poke around in the internals of the MArrayLM object unless you know what you're doing. The F.p.value field contains p-values for the F-test on all coefficients, including the intercept - this corresponds to the null hypothesis that all expression values are zero, and is unlikely to be sensible for single-channel arrays (and probably of limited utility for two-channel arrays).

Your motivation for using the F-test is also misguided. The F-test involving all terms is not uniformly more powerful than a t-test for a specific term, so your pre-filtering might end up throwing out genes that would otherwise be detected by using the t-test directly.

set.seed(42)
y <- matrix(rnorm(60000), ncol=10)
y[,1:2] <- y[,1:2] + 2 # DE genes in group 1.

library(limma)
design <- model.matrix(~gl(5,2))
fit <- lmFit(y, design)
fit <- eBayes(fit)

res.F <- topTable(fit, n=Inf, sort.by="n")
res.t <- topTable(fit, coef=2, n=Inf, sort.by="n")

# Lots of genes lost by the F-test that are
# detected directly by the t-test.
table(SigF=res.F$adj.P.Val <= 0.05, SigT=res.t$adj.P.Val <= 0.05)


In addition, your second round of multiple testing correction fails to account for the fact that the gene list has effectively been pre-filtered based on the same data used in the t-test! This undermines the validity of the multiple testing, resulting in more false positives in the t-test DE gene list.

set.seed(1000)
y <- matrix(rnorm(60000), ncol=6)

# Adding lots of DE genes, but only in group 3.
N <- 2000
y[1:N,5:6] <- y[1:N,5:6] + 2

# Doing an F-test for any difference between groups.
library(limma)
design <- model.matrix(~gl(3,2))
fit <- lmFit(y, design)
fit <- eBayes(fit)
res.F <- topTable(fit, n=Inf, sort.by="n")

# Subsetting and repeating for the 2 v 1 comparison
keep <- res.F$adj.P.Val <= 0.05 res.2 <- topTable(fit[keep,], coef=2, n=Inf, sort.by="n") # Consistently non-zero detected features, despite the fact that # the null is true for all 2 v 1 comparisons - these are false positives. sum(res.2$adj.P.Val <= 0.05)

# Correct FDR adjustment should only yield non-zero values 5% of the time.
unfilt <- topTable(fit, coef=2, n=Inf, sort.by="n")
sum(unfilt$adj.P.Val <= 0.05)  I would suggest not trying to do something fancy, and just to do the test that corresponds directly to your scientific question. If you're interested in the pairwise comparisons - just do them. ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by Aaron Lun24k Thank you so much. This is incredibly helpful. I appreciate your time and expertise. ADD REPLYlink written 10 weeks ago by rs04120 Dear Aaron, May I ask you to comment the two following simulations based on your simulation concerning F vs t test. Here I increased the number of replicates from 2 to 4 and set the number of DE genes to 300. I don't feel any interest in estimating an effect with two samples. In that case, F is finding more genes than t. set.seed(42) replicates <- 4 # per group y <- matrix(rnorm(30000*replicates), ncol=5*replicates) dim(y) #> [1] 6000 20 positives <- 300 # 5% of 6000 genes as DE y[1:positives,1:replicates] <- y[1:positives,1:replicates] + 2 # DE genes in group 1. library(limma) design <- model.matrix(~gl(5, replicates)) fit <- lmFit(y, design) fit <- eBayes(fit) res.F <- topTable(fit, n=Inf, sort.by="n") #> Removing intercept from test coefficients res.t <- topTable(fit, coef=2, n=Inf, sort.by="n") # Overall results table(SigF=res.F$adj.P.Val <= 0.05,
SigT=res.t$adj.P.Val <= 0.05) #> SigT #> SigF FALSE TRUE #> FALSE 5875 19 #> TRUE 45 61 # Knowing the defined simulation table(SigF=res.F$adj.P.Val <= 0.05,
SigT=res.t$adj.P.Val <= 0.05, Positives=seq(nrow(y))<=300) #> , , Positives = FALSE #> #> SigT #> SigF FALSE TRUE #> FALSE 5691 1 #> TRUE 6 2 #> #> , , Positives = TRUE #> #> SigT #> SigF FALSE TRUE #> FALSE 184 18 #> TRUE 39 59  Created on 2019-04-14 by the reprex package (v0.2.1) Next, I tested the F test being biased by the intercept, as I interpreted your remark The F.p.value field contains p-values for the F-test on all coefficients, including the intercept - this corresponds to the null hypothesis that all expression values are zero set.seed(42) replicates <- 4 # per group y <- matrix(rnorm(30000*replicates), ncol=5*replicates) dim(y) #> [1] 6000 20 positives <- 300 # 5% of 6000 genes as DE y[1:positives,1:replicates] <- y[1:positives,1:replicates] + 2 # DE genes in group 1. # Make all expressions clearly different from zero y <- y + 10 library(limma) design <- model.matrix(~gl(5, replicates)) fit <- lmFit(y, design) fit <- eBayes(fit) res.F <- topTable(fit, n=Inf, sort.by="n") #> Removing intercept from test coefficients res.t <- topTable(fit, coef=2, n=Inf, sort.by="n") # Overall results table(SigF=res.F$adj.P.Val <= 0.05,
SigT=res.t$adj.P.Val <= 0.05) #> SigT #> SigF FALSE TRUE #> FALSE 5875 19 #> TRUE 45 61 # Knowing the defined simulation table(SigF=res.F$adj.P.Val <= 0.05,
SigT=res.t$adj.P.Val <= 0.05, Positives=seq(nrow(y))<=300) #> , , Positives = FALSE #> #> SigT #> SigF FALSE TRUE #> FALSE 5691 1 #> TRUE 6 2 #> #> , , Positives = TRUE #> #> SigT #> SigF FALSE TRUE #> FALSE 184 18 #> TRUE 39 59  Created on 2019-04-14 by the reprex package (v0.2.1) The table results are exactly the same. This sounds to be linked to the message Removing intercept from test coefficients. Let me know. ADD REPLYlink written 10 weeks ago by SamGG190 In that case, F is finding more genes than t. Sometimes it will, sometimes it won't. My point is that the F-test is not guaranteed to detect a superset of the genes detected by the t-test, which is also the case in your altered scenarios. This may discard genes that would otherwise have been detected by direct use of the t-test. tl;dr Why do something that (i) requires more work, (ii) potentially reduces power, and (iii) potentially increases the false positive rate? If you care about pairwise comparisons, just do them. The table results are exactly the same. This sounds to be linked to the message Removing intercept from test coefficients. I don't know what you're getting at here. My remark was about the F.p.value field within the MArrayLM object returned by eBayes. This is quite different to the P.Values from topTable. ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by Aaron Lun24k Thanks for your reply. In my case, the "F-test before t-test" strategy comes from the ANOVA. Could you point me any peer reviewed article that discuss the benefits of carrying out t tests directly over the F-then-t strategy in the context of high-dimensional data? Or is it a property of the linear model? Concerning the F.p.value, I made a shortcut. My bad. ADD REPLYlink written 10 weeks ago by SamGG190 The "high-dimensional" aspect is irrelevant. limma fits gene-wise linear models, with no information shared between genes other than through empirical Bayes shrinkage (which has nothing to do with the current problem). You would see the same long-run behaviour with respect to loss of power and type I error control even if you only had one gene. I don't know of any peer reviewed articles that address this problem, but I'd say that's because it's pretty self-evident. The null distribution of the t-statistic will change once you condition on the F-statistic (computed from the same data) being greater than a certain threshold value; all guarantees related to the validity of the subsequent t-tests are lost. ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Aaron Lun24k @SamGG, you haven't compared the F-test to the t-tests correctly. You used an F-test to compare all 5 groups but then conducted only one t-test instead of 4. If you make a fair comparison then the t-tests detect 169 genes and the F-test only 106: > fit <- fit[,-1] > results.t <- decideTests(fit, method="separate") > results.F <- decideTests(fit, method="nestedF") > sum(rowSums( abs(res.t) ) > 0) [1] 169 > sum(rowSums( abs(res.F) ) > 0) [1] 106  I have commented on this issue before, see for example https://support.bioconductor.org/p/16689 . The F-test is more power than the t-tests when there are several non-zero coefficients of similar sizes. The t-tests are more powerful when one or two coefficients are much larger than the others. These principles do not depend on how many replicates you have or how large the intercept is, but the effect does get larger as the number of groups increases. But power measured this way is not appropriate for the OP's experiment anyway. See my separate answer to the OP. ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by Gordon Smyth37k Thanks for your reply and pointing the interesting post you wrote more than ten years ago (you are a library definitively). As answered also to Aaron, could you point me any peer reviewed article that discuss the benefits of carrying out t tests directly over the F-then-t strategy in the context of high-dimensional data? I think this could be valuable for the omics community. ADD REPLYlink written 10 weeks ago by SamGG190 Answer: Help with Limma F.p.value from eBayes 1 10 weeks ago by Gordon Smyth37k Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia Gordon Smyth37k wrote: The idea that you propose, using genewise F-tests as a screen before doing t-tests for each coefficient, is easily implemented in limma. In limma terminology, this procedure is called hierarchical multiple testing with the F-test as the genewise test. The code would be: fit2 <- fit2[,-1] results <- decideTests(fit2, method="hierarchical", genewise.p.value=fit2$F.p.value)


The first line of code removes the intercept from the model, yielding an F-test analogous to the standard overall model F-statistic in textbook regression theory.

However I strongly recommend against this approach for your experiment. In your case, you have several covariates that are not of primary interest (gender and age) and one or two variables of primary interest (either group or medUse, I can't tell which is of primary interest from your post). You should not be putting the nuisance covariates on the same level as the variable of primary interest, which is what the F-test does. You should not be highlighting genes with large age or gender effects. You instead should test for the coefficient of primary scientific interest on its own.

The F-test does not really reduce the multiple testing burden, because an F-test for multiple contrasts is like a form of multiple testing. The difference is that the F-test as more power to detect genes with significant signals for several coefficients at once while the t-tests have more power to detect genes with one big t-test. See Aaron's answer or an old post from me https://support.bioconductor.org/p/16689 for more discussion of this.