Entering edit mode
Gary Churchill
▴
10
@gary-churchill-1543
Last seen 10.2 years ago
Regarding the thread on limma versus maanova. Clearly I can only
offer a biased point of view, but I would strongly encourage people to
try the methods that are implemented in both software packages. A
major motivator for us to contribute R/maanova to the Bioconductor
project was to facilitate this kind of comparison.
We made an effort to create a version of R/maanova that integrates
easily with other Bioconductor packages. Feedback on how well this
does or does not work is most welcome. It can always be improved.
I'd like to clarify the current state of R/maanova. We've come a long
way since Kerr, Martin and Churchill (2000, JCompBiol).
The core of R/maanova is a mixed model ANOVA program. We benchmarked
and validated it against SAS/Proc Mixed and it compares well for speed
and accuracy. The syntax is deceptively simple. R/maanova is capable
of fitting essentially any mixed model to any experimental design.
R/maanova is specialized for microarray data. This restriction
provides some efficiency in the implementation.
R/maanova includes an EB test statistic. Indeed we recommend and
exclusively use the "Fs" test statistic in applications. This is an
empirical Bayes test based on shrinkage estimates of variance
components. We formulated it as F-like statistic rather than an odds
ratio, but it is still an "information borrowing" statistic. There is
now ample evidence in the literature that EB statistics are the way to
go. In our simulation studies both B (limma) and Fs gave comparable
performance. Both of these statistics outperform, the simple gene-by-
gene F test (F1 in R/maanova) and the pooled variance method (F3 in
R/maavova) and most other methods to which we made comparisons (e.g.,
SAM). Some of these results can be found in Cui et al 2005
Biostatistics. The bottom line here is that B and Fs are not
fundamentally different in settings where both apply. However Fs is
more general in allowing for multiple variance components and in the
ability to test a wider range of hypotheses.
I am puzzled by the reference to gene-by-gene analysis as a feature of
R/maanova. In R/maanova you can restrict information to only the
gene tested (F1), you can pool variance estimates across all genes
(F3) or you can use an EB approach (Fs). The latter is recommended.
As noted in the thread, the current implementation of limma does not
allow for multiple sources of variance. In some cases, this is not an
issue. Treating array effects as fixed or as random does not have
much impact (except in the speed). However if your design is nested
with multiple assays of the same biological sample, failure to treat
the biological replicates as a random effect can be seriously
misleading. The tests will be liberal, often very liberal. One work-
around is to average all of the data from a single bio rep before
doing the analysis. R/maanova will allow you to fit the full mixed as
well as the (inappropriate) fixed effect model - so you can do the
wrong analysis and compare the two sets of results.
R/maanova offers many options for permutation testing. Hao Wu worked
out a scheme to ensure that permutations are done properly in the
presence of stratification variables (covariates). However we are
concerned by the recent paper of Xie et al (Bioinformatics 2005).
R/maanova will pool p-values across genes by default (not pooling is
an option). Although pooling was recommended by Storey et al (in the
Parmagianni volume) I am now feeling uneasy about it. To pool or not
pool p-values? It is an open question. We are noticing that the FDR
corrections can be very fragile. This is also worth a closer look,
but it is a methodology question, not a software issue.
Missing/flagged data can now be handled in R/maanova using a single
imputation method. Not ideal but better than throwing out data.
It is a good idea to check your data for dye-intensity effects and for
spatial effects. R/maanova includes multiple data transformation
options. We use the intensity loess routinely and spatial loess only
in dire situations. I would be interested to hear from users who have
used other bioconductor tools for pre-processing in conjunction with
R/maanova. It was our intent to make this possible and easy.
R/maanova works with Affy data. This seems to be a widespread
misunderstanding. We routinely do the pre-processing of Affy data
using other bioconductor functions. The output from rma, for example,
can be piped right into R/maanova. R/maanova will also analyze three
(four, five, ...) color arrays if you happen to be using that
technology. (We can provide example scripts for Affy analysis on
request.)
Regarding the analysis of intensity versus ratios: Reduction of data
to ratios destroys information. If you have readings of 100, 200 for
gene1 and 10000, 20000 for gene2, the ratio is 2 in both cases. You
cannot recover the lost information. Perhaps it is not a big deal in
some settings because there is little interest in the 'interblock
information'. Treating array as fixed will have much the same effect
as taking ratios when ratios make sense. But there are many designs
for which there is no obvious reduction of the data to ratios. For
example, a three way comparison without a reference sample. (You
could use R/maanova to analyze ratio data, it may require some data
manipulation in native R to get it set up. I am not recommending it.)
Both R/maanova and R/limma are powerful analysis tools. Flexibility
is a strong point for both but perhaps more so for R/maanova. As with
any good tool, these can be misapplied. The discussion that prompted
my comment indicates that people are thinking about these issues. I
am not as well informed about R/limma as I should be, so I apologize
if I have made any misrepresentations. I hope that this contribution
is helpful and that people will continue to use both tools!
Gary
----------------------
Gary A. Churchill
Senior Staff Scientist
The Jackson Laboratory
600 Main StreetBar Harbor, ME 04609
ph: (207) 288-6189
fax: (207) 288 6077
email: gary.churchill at jax.org
<http: www.jax.org="" staff="" churchill="" labsite=""/>
----------------------
Now he needed a great variety of models whoseelements could be
combined in order to arriveat the one that would best fit reality.
-Italo Calvino.