Question

limma - F-test for many condition

1

Entering edit mode

samuel collombet ▴ 140

@samuel-collombet-6574

Last seen 6.9 years ago

France

Hi all,

Sorry if this question has already been addressed, I did not really found an answer on the site...

I have a dataset of microarray (agilent 1 color) with ~35 condition (corresponding to 3 time series, but with different time points, and where time cannot be consider as a continuous variable), 2 rep each. I have imported them with limma and normalized with vsn normalization.
I would like to identify gene that change in at some point in one of the time course only (corresponding to 7 conditions), like an anova or a likelihood ratio test, in order to perform clustering after.

From the vignette I understood that the F-test of limma would give me what I want, correct?

In the vignette example, to do it on 3 condition, a contrast is made with all pairwise comparisons ; do I really have to do that? if I just use lmfit() and eBayes():
> fit <- lmFit(arrayNorm, design)
> it <- eBayes(fit))
I got a F.p.Value in my fit object, does this correspond to the F-test for all the element in the design matrix?

If so, to perform the test on just a subset of my data, can I just give a design matrix with only the samples and factors that I am interested in?

I understood from few post online that Benjamini Hochberg correction is not optimal for F-test (am I wrong?), but I did not really got what should I use instead in this case. Could someone light me on that?

Thanks!

limma F-test • 5.1k views

ADD COMMENT • link 9.2 years ago • updated 9.1 years ago samuel collombet ▴ 140

score 4 · Answer 1 · 2015-02-27

The F-statistics in your MArrayLM object are based on all the coefficients in your model, so if you have an intercept, it is not likely what you want (because the intercept estimates the mean expression for the 'baseline' group, which isn't of interest). Instead, you want to use topTable() without a coefficient, in which case the intercept will be removed from consideration. In any case, it is always safer to use methods that are intended to extract data (topTable), rather than directly accessing data in the MArrayLM object.

But do note that you have to include all the comparisons you care about in your model, or the F-test cannot test for all the coefficients you care about. As for the BH method, it operates on p-values, not the statistic that was used to generate the p-value, so I can't see why anybody would say it isn't optimal for a particular test. But I would be interested to know where you got that idea. Perhaps I am wrong about that.

score 3 · Answer 2 · 2015-02-27

You don't say what vignette have read, but I guess that you are referring to Section 9.3 "Several Groups" of the limma User's Guide. In the Section 9.3 example, the easiest way to do an F-test between groups would be:

> design <- model.matrix(~f)
> fit <- lmFit(eset, design)
> fit <- eBayes(fit)
> topTable(fit)

This is essentially what James has suggested. You can also get the desired F-statistics and p-values directly by fit[,-1], which removes the intercept and recomputes the F-statistics.

To answer your second question, no, you should not run a separate analysis with fewer samples and a smaller design matrix every time you want to answer a particular question. Just keep the same fit object and extract the contrast you want. That is faster, easier and better.

There is no problem with BH in the context of F-tests. If you want to ask a question about something you have read elsewhere, please give a link to the post that you are asking about. Otherwise, as James has said, it is hard to know why you would ask this question.

score 0 · Answer 3 · 2015-03-04

0

Entering edit mode

samuel collombet ▴ 140

@samuel-collombet-6574

Last seen 6.9 years ago

France

Thanks James and Gordon!

Gordon I was indeed referring to section 9.3 of the User's guide.

AS for the BH correction, my bad, it was my misinterpretation.

ADD COMMENT • link 9.1 years ago samuel collombet ▴ 140