Explanation of an ANOVA approach of multiple coefficients in limma with topTable in a microarray analysis
1
0
Entering edit mode
@konstantinos-yeles-8961
Last seen 8 weeks ago
University of Salerno, Salerno, Italy

Dear Bioconductor Community,

I would like to ask a specific question regarding the interpretation of an "ANOVA" approach in limma and topTable function. In detail, in a previous post i have created ( C: Questions about complex design in limma regarding an agilent microarray dataset ) , Aaron helpfully mentioned the difference about dropping separately each coefficient in topTable about a statistical comparison (i.e. coef=1) and by dropping for instance coef=1:4, which essentially performs an ANOVA test checking for DE in any of my comparisons. Thus, my crucial (and might naive question) is the following: is it sensibleĀ  to get a significantly greater number of DE genes in my ANOVA implementation, than in the sum of dropping each coefficient separately ? And this could be probably due to the "nature" of the ANOVA testing ? In other words, what is the crucial difference in the computation of statistics and DE genes when moving from i.e. coef=2 (a specific comparison) to coef=1:4 ? For instance, the ANOVA approach also tests for difference in means in coef=1 versus coef=2 ? Or this is irrelevantĀ  as all the mentioned comparisons have been specified in the makeContrasts function? (above link for code).

Please excuse me for this beginner question, but I'm a newbie in R/statistics and this specific part is very crucial !!

Best Regards,

Konstantinos Yeles

limma ANOVA topTable microarray • 1.3k views
3
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

There's no secret difference between anova and t-tests, it's sort of commonsense.

You can think of the anova test as pooling the four separate t-tests into an F-test. If all four t-tests were borderline significant, then the anova F-test will naturally be more powerful than doing separate t-tests, because it will accumulate information from all the tests. Getting one borderline t-statistic may not be surprising, but getting four of them is less likely by chance and will lead to a small p-value. The F-test therefore can be more significant than any of the individual t-tests.

On the other hand, if only one of the t-tests is large and the other three are small, then the F-test will be less powerful than the t-tests because the small statistics will dilute the large one. So the F-test may be much less significant than the most significant of the t-tests.

In general, if all four contrasts are similar in size then the F-test will be more powerful than the t-tests. If only one of the contrasts tends to be DE, then individual t-tests will tend to be more powerful.

0
Entering edit mode

Dear Gordon, thank you very much for your answer !! Just two quick points to mention in order to be on the "safe side":

1) About the comparing coefficients with coef=1:4--essentially, ANOVA will perform only the comparisons that have already been defined in the coefficients with makeContrasts, right ? For example, if coef=1 represents bystander samples vs control samples in 0.5h, ANOVA will NOT also perform a between coefficients comparison, correct ? I.E. coef2 vs coef4.

2) Or my above notion is incorrect, and actually ANOVA compares in any of the means in each contrast (defined above-shown in the previous post) in each gene is significantly higher("different") from the other three ?

Please excuse for my new question, but this is the point that confuses me the most for the specific interpretation !!

0
Entering edit mode

It depends on how the coefficients are defined, i.e., what they mean in the context of the fitted model. Sorry, but I only want to answer the question you asked here. I don't have time to read your earlier post and the long question and answer series with Aaron.