Hello,
I am using Affymetrix arrays to compare two groups of samples (5 in
one
group and 4 in the other). I have been using affy, affylmGUI, and
limma
for the analysis. Some of the samples in the latter group have
replicates and I'd like to know how to handle these. My questions are:
1) In one case, the replicate is drawn from the same sample but at a
later time point. Is this a technical replicate (because it is the
same
sample but different chip), biological replicate (i.e., the different
time point make it *effectively* a different sample), or neither (and
thus can't be used)?
2) I assume I need to average the gene expression of the technical
replicates before doing the analysis and treatment them as *one*
sample?
Is this correct?
3) Assuming I do need to average the expression of the technical
replicates, are there methods in affy or limma to do this? Or do I
need
to do it manually with something like:
Data <- ReadAffy()
eset <- rma(Data)
# do something here to create a new eset where the technical replicate
columns have been replaced by a single column averaging the two
# go on to limma analysis
Thanks,
Jonathan
--
Dr Jonathan Arthur
Sesqui Lecturer in Bioinformatics
Central Clinical School, Faculty of Medicine and SUBIT
Medical Foundation Building, K25
University of Sydney
Ph: +61 2 9036 3132
Email: jarthur@med.usyd.edu.au
> Date: Thu, 02 Jun 2005 09:18:01 +1000
> From: Jonathan Arthur <jarthur@med.usyd.edu.au>
> Subject: [BioC] Defining and handling replicates
> To: bioconductor@stat.math.ethz.ch
>
> Hello,
>
> I am using Affymetrix arrays to compare two groups of samples (5 in
one
> group and 4 in the other). I have been using affy, affylmGUI, and
limma
> for the analysis. Some of the samples in the latter group have
> replicates and I'd like to know how to handle these. My questions
are:
Have you read the sections on technical replication in the Limma
User's Guide? That would be the
place to start.
> 1) In one case, the replicate is drawn from the same sample but at a
> later time point. Is this a technical replicate (because it is the
same
> sample but different chip), biological replicate (i.e., the
different
> time point make it *effectively* a different sample), or neither
(and
> thus can't be used)?
Does "drawn at a later time point" mean that the RNA was extracted
from the same organisim but at
a later time? Or does it mean that the RNA was simply aliquoted later
from a stored sample? You
would need to describe your experiment much more fully before people
could help you with the
analysis.
> 2) I assume I need to average the gene expression of the technical
> replicates before doing the analysis and treatment them as *one*
sample?
> Is this correct?
Last resort.
Gordon
> 3) Assuming I do need to average the expression of the technical
> replicates, are there methods in affy or limma to do this? Or do I
need
> to do it manually with something like:
>
> Data <- ReadAffy()
> eset <- rma(Data)
> # do something here to create a new eset where the technical
replicate
> columns have been replaced by a single column averaging the two
> # go on to limma analysis
>
> Thanks,
>
> Jonathan
>
> --
> Dr Jonathan Arthur
> Sesqui Lecturer in Bioinformatics
> Central Clinical School, Faculty of Medicine and SUBIT
> Medical Foundation Building, K25
> University of Sydney
> Ph: +61 2 9036 3132
> Email: jarthur@med.usyd.edu.au
To swap the order of my original two questions:
Gordon K Smyth wrote:
> Does "drawn at a later time point" mean that the RNA was extracted
from the
> same organisim but at a later time?
The source of mRNA for the microarrays are plates of bacteria cultured
from clinical samples provided by (human) subjects.
In most cases, one patient => one sample => one culture => one RNA
extraction => one microarray. I assume each microarray is a biological
replicate grouped by the clinical status of the patient (disease vs
control).
In one case, however, one patient => one sample => one culture => one
RNA extraction => *two* microarrays. The two arrays were performed
several months apart but come from the same RNA extraction (frozen
during the interim). I assume these are technical replicates.
In another case, one patient => one sample => *two* cultures made
several months apart (sample frozen in interim) => two extractions =>
two microarrays. Is this a biological or technical replicate? The fact
it is from the same patient/sample suggests a technical replicate, but
the different culture suggests a biological replicate??
> Have you read the sections on technical replication in the Limma
User's Guide?
> That would be the place to start.
Yes, however I am having difficultly rationalising the section
on "Two Groups: Affymetrix" with the two on "Technical Replication"
If I treat everything as biological replicates, using a group-means
parameterization, the design I use is:
> design <- cbind(disease=c(1,1,1,1,1,0,0,0,0,0,0,0),control=c(0,0,0,0
,0,1,1,1,1,1,1,1))
Presumably, I need to do something like:
corfit <- duplicateCorrelation(eset, design, ndups=1, block=c(???))
fit <- lmFit(eset, design, block=c(???), correlation=corfit$consensus)
checking first to make sure corfit$consensus is positive.
But I am not clear on how to define the block vector?
Thanks for your help.
Jonathan
Thanks for the further explanation.
At 12:20 PM 7/06/2005, Jonathan Arthur wrote:
>To swap the order of my original two questions:
>
>Gordon K Smyth wrote:
>
>>Does "drawn at a later time point" mean that the RNA was extracted
from
>>the same organisim but at a later time?
>
>The source of mRNA for the microarrays are plates of bacteria
cultured
>from clinical samples provided by (human) subjects.
>
>In most cases, one patient => one sample => one culture => one RNA
>extraction => one microarray. I assume each microarray is a
biological
>replicate grouped by the clinical status of the patient (disease vs
control).
>
>In one case, however, one patient => one sample => one culture => one
RNA
>extraction => *two* microarrays. The two arrays were performed
several
>months apart but come from the same RNA extraction (frozen during the
>interim). I assume these are technical replicates.
Yes.
>In another case, one patient => one sample => *two* cultures made
several
>months apart (sample frozen in interim) => two extractions => two
>microarrays. Is this a biological or technical replicate? The fact it
is
>from the same patient/sample suggests a technical replicate, but the
>different culture suggests a biological replicate??
Technical replication refers to any replication which fails to repeat
all
the relevant steps, so this is technical replication. However, as
you've
explained clearly yourself, in any multistage process there are many
possible levels of technical replication. In your previous example,
the
variation between the technical replicates would reflect only the
microarray component of variation. In this case, the variation between
technical replicates reflects variation between cultures and
extractions as
well as the variation between microarrays.
>>Have you read the sections on technical replication in the Limma
User's
>>Guide?
>>That would be the place to start.
>
>Yes, however I am having difficultly rationalising the section on
"Two
>Groups: Affymetrix" with the two on "Technical Replication"
>
>If I treat everything as biological replicates, using a group-means
>parameterization, the design I use is:
>
>>design <-
>>cbind(disease=c(1,1,1,1,1,0,0,0,0,0,0,0),control=c(0,0,0,0,0,1,1,1,1
,1,1,1))
>
>Presumably, I need to do something like:
>
>corfit <- duplicateCorrelation(eset, design, ndups=1, block=c(???))
>fit <- lmFit(eset, design, block=c(???),
correlation=corfit$consensus)
>
>checking first to make sure corfit$consensus is positive.
>
>But I am not clear on how to define the block vector?
For an experiment which systematically uses both biological and
technical
replication, you would set block=Patient. In your experiment however
you
don't have enough technical replication to reliably decompose
variability
into biological and technical components, and the technical
replication is
inconsistent anyway.
One approach, which you already have mentioned, is to average over
your
technical replicates. This will however invalidate any rigorous
statistical
analysis, because the averages will be less variable than the
individual
arrays, by an amount which is unknown, because you don't know how much
technical variation you are averaging over.
The simplest approach for you would be to simply choose what you think
are
the best arrays for the two patients for whom you have replicates, and
discard the two superfluous arrays.
Alternatively, there is a trick which would allow you to use all your
arrays. But it requires a feature of the lmFit() function which I
don't
wish to publicly document yet, as it would be easy to mis-use, so I
will
write to you offline.
Gordon
>Thanks for your help.
>
>Jonathan