I apologize for not thanking you more quickly for your detailed and
response. I think I agree with everything you've said below, but now I
another concern on which I would like your opinion.
For many of the data sets I've dealt with, for many genes, the
variances of the
two classes do not seem to be equal. For example, the code below uses
var.test() to produce a p-value for each gene and then plots a
histogram of the
p-values. The histogram can be viewed at
The model implemented in limma seems to assume a single variance for
Do you think this is a problem?
pdat <- pData(ALL)
subset <- intersect(grep('^B', as.character(pdat$BT)),
which(pdat$mol %in% c('BCR/ABL', 'NEG')))
eset <- ALL[, subset]
i1 <- which(eset$mol == 'BCR/ABL')
i2 <- which(eset$mol == 'NEG')
pvals <- apply(exprs(eset), 1, function(v) (var.test(v[i1],
jpeg(filename='ALL.jpeg', width=240, height=240)
main='Histogram of var.test() pvals for ALL BCR/ABL vs NEG')
> -----Original Message-----
> From: Gordon Smyth [mailto:smyth at wehi.EDU.AU]
> Sent: Thursday, April 20, 2006 8:02 PM
> To: Wittner, Ben, Ph.D.
> Cc: bioconductor at stat.math.ethz.ch; J.delasHeras at ed.ac.uk
> Subject: [BioC] Limma: correct calculation of B statistics (log
> Dear Ben,
> Please see also my longer reply to Jose in a separate email.
> The t-statistics, p-values and gene rankings provided by limma do
> depend on the assumed proportion. In fact part of the motivation for
> developing the moderated t-statistics was to obtain a statistic with
> the same power as the posterior odds without needing this
> difficult-to-estimate quantity.
> While the B-statistic does depend on the prior assumed proportion,
> this is dependence is very straightforward, well understand and
> explicit. The prior log-odds simply adds a constant to all the
> genewise B-statistics. It doesn't change the ordering.
> I agree with your desire to avoid dependence on unjustified
> assumptions. My approach in limma has been to minimise assumptions
> where possible but otherwise to make the assumptions very explicit.
> What I personally feel uneasy about are statistical methods which
> propose to estimate quantities about which the data contains very
> little information. The dependence on assumptions may be hard to
> It seems to me that the proportion of DE genes is just such a
> quantity, because its estimation must be highly sensitive to model
> assumptions in small microarray experiments. I could easily provide
> an automatic estimate of this quantity as part of the eBayes()
> computations in limma, but I deliberately chose not to do this.
> Expanding a little further on this topic, it seems to me that a
> biologically meaningful treatment of the proportion of truly DE
> would require a more careful definition of the concept of
> differential expression than has so far appeared in the literature.
> It seems to me that mathematicians and biologists have different
> things in mind when they think of this quantity. Mathematicians are
> including many genes with very small fold changes which the
> biologists would do not consider of interest. A biologically
> meaningful treatment would have to specify how large a fold change
> needs to be in order to be considered material. I suspect that
> biologists are going to be surprised by how sensitive the estimated
> proportion is to this threshold.
> Best wishes
> >[BioC] Limma: correct calculation of B statistics (log odds)
> >Wittner, Ben, Ph.D. Wittner.Ben at mgh.harvard.edu
> >Thu Apr 20 19:40:10 CEST 2006
> >I'm very glad you asked this question. One of the things that has
> >of using limma is that the proportion of differentially expressed
> >genes is often
> >one of the primary things I'm trying to discover from the data, so
> >feel uneasy
> >making an assumption as to what that proportion is. In your email
> >below, you say
> >that the output of limma is sensitive to the assumption, which, of
> >course, makes
> >me feel even more uneasy about it.
> >I've not noticed any responses on the BioC list. Has anyone
> >issue to you?