Dear Paul,
to see what's going on, it may be helpful to step away from Limma for
a moment, and think of a case-control experiment with one array per
subject in the context of standard mixed-effects ANOVA. The model for
a gene will look like that:
log(intensity) = GrandMean + Group{Fixed} + Array(Group){Random,
confounded with Subject} + replicate(Array){Random}
With this model, you test the difference between the groups against
the variation between arrays. In the special case of balanced design
(i.e. an equal number of arrays per group, and an equal number of
replicates per array), this test only involves averages of replicates
in each array. In other words, using this model will be equivalent to
using the model
log(averageIntensityOverReplicates) = GrandMean + Group{Fixed} +
Array(Group){Random, confounded with Subject}
after averaging the spots on an array. However the models are not
equivalent if your design is unbalanced or if you have missing values.
The difficulty with the ANOVA is that, when you have a small number of
arrays (=subjects) per group, the variance of Array(Group) is not
estimated reliably in a single gene. In Limma, Gordon took a different
approach, and estimated the variance of Array(Group) prior to fitting
the model, using the information across all genes. The advantage is
that the estimator uses more data, and is therefore more reliable. The
disadvantages are that (1) it makes the assumption of a same variance
of Array(Group) in all genes, and (2) the subsequent model fit assumes
that this variance is known, and not estimated from the same data.
Therefore, I second Gordon's opinion: if you have enough arrays per
group and a balanced design, than averaging is appropriate. If you
have few arras but many technical replicates, estimating this variance
using the duplicateCorrelation() may work best.
And one more comment: although it is true that in many cases we want
to make inference about populations (e.g. patients) from a random
sample of individuals, this is not always the case. Experiments with a
small number of subjects may not contain enough information about the
underlying population. In this case one may want to restrict the
inference to gene expressions in these specific subjects, and only
account for technical variation in the microarray measurements. This
translates into the ANOVA model
log(intensity) = GrandMean + Group{Fixed} + Array(Group){Fixed,
confounded with Subject} + replicate(Array){Random}
In Limma, this will amount to adding "Array" as an extra model term,
and skipping duplicateCorrelation().
Best regards
Olga Vitek
Assistant Professor
Statistics and Computer Science
Purdue University
>
>
> ------------------------------
>
> Message: 18
> Date: Thu, 29 Oct 2009 17:50:30 +1100 (AUS Eastern Daylight Time)
> From: Gordon K Smyth <smyth at="" wehi.edu.au="">
> Subject: Re: [BioC] Very low P-values in limma
> To: Paul Geeleher <paulgeeleher at="" gmail.com="">
> Cc: "Mayer, Claus-Dieter" <c.mayer at="" abdn.ac.uk="">, Bioconductor
mailing
> list <bioconductor at="" stat.math.ethz.ch="">
> Message-ID: <pine.wnt.4.64.0910291625370.4880 at="" pc602.alpha.wehi.edu.au="">
> Content-Type: text/plain; charset="x-unknown"; Format="flowed"
>
> Dear Paul,
>
> Is it possible that you haven't quoted your professor verbatim?,
> because
> these comments don't make sense as they stand. I really know what
he
> might mean by real p-values or the assumption of zero measurement
> error.
> Measurement error obviously can't be zero. Nor can there be an
> infinite
> number of replicates. None of the alternative analysis methods we
> have
> discussed makes either of these assumptions. I wasn't the one
> arguing for
> averaging within-array replicates, but if that method did assume
> what you
> say, then it would have to be an invalid method.
>
> On the other hand, your professor is quite right to say that within-
> array
> replicates measure technical rather than biological variability. In
a
> univariate analysis, one would simply average the technical
> replicates.
> This would give a summary reponse variable, with a variance made up
of
> both biological and technical components, with replicates that you
> could
> reasonably treat as independent.
>
> In a genewise microarray analysis, averaging the within-replicates
> has a
> disadvantage in that it fails to penalize (lower the rank of) genes
> which
> have high within-array variability. If biological variability is
high
> compared to technical, and you have a enough array replicates to get
a
> decent estimate of between-array variability, then averaging the
> within-array replicates is likely still the way to go, just as in a
> univariate analysis. On the other hand, if technical variability
> (within
> and between arrays) is relatively large compared to biological, and
> the
> number of array replicates is very small, then the information in
the
> within-array variances can be too valuable to ignore.
> duplicateCorrelation uses the fact that the between-array variance
> has a
> technical as well as a biological component, and the between and
> within
> technical components tend to be associated across probes for many
> microarray platforms. It is this last assumption which allows us to
> make
> use of within-array standard deviations when making inferences about
> between sample comparisons.
>
> If your priority is to get reliable p-values, and you think you have
> enough array replication to do this, then average the within-array
> replicates. If your array replication is limited, technical
> variability
> is high, and your priority is to rank the genes, then
> duplicateCorrelation
> may help. I would add that microarray p-values should always be
taken
> with a grain of salt, as it's impossible to verify all assumptions
in
> small experiments, and it's useful instead to think in terms of
> independent verification of the results.
>
> This is really as far as I want to debate it. Obviously it's your
> analysis and you should use your own judgement. As a maths graduate
> student, you would be able to read the duplicateCorrelation
published
> paper if you want to check the reasoning in detail.
>
> Best wishes
> Gordon
>
>
> On Wed, 28 Oct 2009, Paul Geeleher wrote:
>
>> Dear list,
>>
>> The following are the words of a professor in my department:
>>
>> I still don't get why the 'real' p-values could be better than
>> p-values you get with the assumption of zero measurement error. By
>> averaging over within array replicates you are not ignoring the
>> within
>> array replicates, instead you are acting as though there were
>> infinitely many of them, so that the standard error of the
expression
>> level within array is zero. Stats is about making inferences about
>> populations from finite samples. The population you are making
>> inferences about is the population of all late-stage breast
cancers.
>> The data are from 7 individuals. The within-array replicates give
an
>> indication of measurement error of the expression levels but don't
>> give you a handle on the variability of the quantity of interest in
>> the population.
>>
>> Paul
>>
>> On Sat, Oct 24, 2009 at 2:44 AM, Gordon K Smyth <smyth at="" wehi.edu.au="">
>> wrote:
>>>
>>>
>>> On Sat, 24 Oct 2009, Gordon K Smyth wrote:
>>>
>>>> Dear Paul,
>>>>
>>>> Give your consensus correlation value, limma is treating your
>>>> within-array
>>>> replicates as worth about 1/3 as much as replicates on
>>>> independent arrays
>>>> (because 1-0.81^2 is about 1/3).
>>>
>>> Sorry, my maths is wrong. ?The effective weight of the within-
array
>>> replicates is quite a bit less than 1/3, given ndups=4 and
cor=0.81.
>>>
>>> Best wishes
>>> Gordon
>>>
>>
>>
>>
>> --
>> Paul Geeleher
>> School of Mathematics, Statistics and Applied Mathematics
>> National University of Ireland
>> Galway
>> Ireland
>>