Dear all,
I would like to extend the discussion on my Design and Contrast Matrix for Linear Model. To Combine or not to Combine?
Briefly, I have data from two groups that are followed longitudinally at three different time points. After going through the workflow of the limma package, (lmFit, contrast fit, and eBayes), now here comes the decideTests.
As the contrasts are closely related, I opted to use 'method=nestedF' instead of 'method=separate', which, if I understand correctly, will identify the contrast in which a gene is deemed significant after the moderated F-tests (that subsequently will reduce the number of multiplicity adjustments that I have to do).
What is your opinion regarding my approach?
Thank you for your response.
Dear Prof. Smyth,
Suppose that I want to test the following null hypotheses:
contrast
as defined below).I find single-factor design easier to understand, so I create a new variable combining
group
andtimepoint
.Then, I proceed with the workflow until I get an
eBayes
object namedfit2
.Using topTable, is it correct if I answer my first hypothesis with:
and my second hypothesis with:
And if I understand correctly, FDR adjustment for topTable is done with
method=separate
,right?However, the contrasts above, as I understand it, are closely related. Then, is
method=separate
the right one for this ormethod=global
?And my burning question is, does the approach above equivalent with Anova followed with post-hoc T tests?
Thank you for your time, Prof. Smyth. It is very easy to get an output from the software but then it is actually quite challenging to do it right.
1.
method
is only an argument fordecideTests
. FortopTable
, the manner of the correction is unambiguous - you have a set of p-values, and you apply the Benjamini-Hochberg correction across them.2. As mentioned in 1,
method
doesn't have a role here. Nonetheless, I will try to answer what seems to be your more conceptual question; should we apply the BH method to the p-values within each contrast or to the pool of p-values across all contrasts? This comes down to the level of FDR control that you want. I would say that controlling the FDR for DE genes within each contrast generally makes most sense for interpretation, i.e., in my set of DE genes, I expect no more than 5% of them to be false positives. That's pretty clear to most people, especially the end-users (i.e., wet-lab people) who use it to make decisions.However, there are also cases where the FDR across all contrasts may be interest, in which case applying the BH method globally is more appropriate. This can be the case when you want to make statements across contrasts, e.g., "the comparison between A and B yields twice as many DE genes as the comparison between C and D". By applying the BH method to the pool of p-values, you ensure that the same p-value threshold is used to define DE genes across all contrasts. This avoids inflated differences from the "rich getting richer" behaviour of the BH method, where the presence of some low p-values relaxes the threshold for other genes.
3. No. But why are you worrying about that anyway? It sounds like you want to know which genes are DE between pairs of groups, so just forget about the ANOVA and use the individual contrasts in
topTable
. There's no point worrying about minimizing the number of multiplicity adjustments - if you need inferences for individual contrasts, then you're going to have to correct for them anyway, ANOVA or otherwise. Even post hoc tests involve some implicit correction for multiple testing across contrasts.I should also add that the FDR is quite generous when it comes to multiple contrasts. If you get a DE set with 5% false discoveries, and another DE set with 5% false discoveries, and you pool the results, then the pooled set will still have a FDR of 5%. (I'm assuming that you retain a distinction between significant results for same gene but different contrasts; and that at least one true positive is detected in each component set.) So the penalty is less severe than you might think.
By comparison, post-hoc tests like Tukey's control the FWER across contrasts, which is a more conservative criterion - especially if you don't intend to perform all pairwise contrasts. Moreover, this is only done for a single gene, so you'll still have to correct for multiple testing across genes. Finally, it's never been clear to me what is the error rate when you only run post hoc tests on the genes that are detected as being significant from ANOVA. It's much easier to interpret the FDR for specific pairwise comparisons - and if you don't care about an interpretable error rate, then what's the point of doing any of this?
Thank you for your elaborate reply, Aaron.
The point that I wanted to make (or confirm) for the first question is that topTable will control for FDR by treating the p-values as a single vector of values, just like supplying
method=separate
in decideTests to adjust for FDR separately for each tested contrasts. I do know thatmethod
is an argument fordecideTests
only. I guess I was not clear on that before.For my second question, I want to compare the number of DE genes across the contrasts. Thanks to your explanation, I am now confident to use
method=global
for mydecideTests
. Indeed, in several of my contrasts, there are a few very very low p-values that 'cranked up' other genes, thereby inflating the number of DE genes if I usemethod=separate
compared tomethod=global
.P.S. Did I use the correct method to answer my first and second hypotheses?
No, your code is incorrect, and would lead to errors if you tried to run it, because topTable doesn't accept a contrast argument.
The idea of anova followed by post-hoc tests is not a helpful concept here. All the limma tests are conducted post-hoc.
I did not elaborate on that for the sake of brevity, but then apparently clarity was sacrificed. I fit an
lmFit
object namedfit
with thecontrasts
as below.To get the summary of DE genes in all contrasts, I did the following:
My bad with the codes above as I was not working with my script beside me. I will be much more careful next time. As I supplied
method="global"
argument to mydecideTests
, I usedwrite.fit
to get my DE genes for each contrasts, not thetopTable
command.