Question

Defining and handling replicates

0

Entering edit mode

Jonathan Arthur ▴ 70

@jonathan-arthur-1200

Last seen 11.1 years ago

Hello, I am using Affymetrix arrays to compare two groups of samples (5 in one group and 4 in the other). I have been using affy, affylmGUI, and limma for the analysis. Some of the samples in the latter group have replicates and I'd like to know how to handle these. My questions are: 1) In one case, the replicate is drawn from the same sample but at a later time point. Is this a technical replicate (because it is the same sample but different chip), biological replicate (i.e., the different time point make it *effectively* a different sample), or neither (and thus can't be used)? 2) I assume I need to average the gene expression of the technical replicates before doing the analysis and treatment them as *one* sample? Is this correct? 3) Assuming I do need to average the expression of the technical replicates, are there methods in affy or limma to do this? Or do I need to do it manually with something like: Data <- ReadAffy() eset <- rma(Data) # do something here to create a new eset where the technical replicate columns have been replaced by a single column averaging the two # go on to limma analysis Thanks, Jonathan -- Dr Jonathan Arthur Sesqui Lecturer in Bioinformatics Central Clinical School, Faculty of Medicine and SUBIT Medical Foundation Building, K25 University of Sydney Ph: +61 2 9036 3132 Email: jarthur@med.usyd.edu.au

GO affy limma affylmGUI GO affy limma affylmGUI • 1.5k views

ADD COMMENT • link updated 20.4 years ago by Gordon Smyth 53k • written 20.4 years ago by Jonathan Arthur ▴ 70

score 0 · Answer 1 · 2005-06-02

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 22 hours ago

WEHI, Melbourne, Australia

> Date: Thu, 02 Jun 2005 09:18:01 +1000 > From: Jonathan Arthur <jarthur@med.usyd.edu.au> > Subject: [BioC] Defining and handling replicates > To: bioconductor@stat.math.ethz.ch > > Hello, > > I am using Affymetrix arrays to compare two groups of samples (5 in one > group and 4 in the other). I have been using affy, affylmGUI, and limma > for the analysis. Some of the samples in the latter group have > replicates and I'd like to know how to handle these. My questions are: Have you read the sections on technical replication in the Limma User's Guide? That would be the place to start. > 1) In one case, the replicate is drawn from the same sample but at a > later time point. Is this a technical replicate (because it is the same > sample but different chip), biological replicate (i.e., the different > time point make it *effectively* a different sample), or neither (and > thus can't be used)? Does "drawn at a later time point" mean that the RNA was extracted from the same organisim but at a later time? Or does it mean that the RNA was simply aliquoted later from a stored sample? You would need to describe your experiment much more fully before people could help you with the analysis. > 2) I assume I need to average the gene expression of the technical > replicates before doing the analysis and treatment them as *one* sample? > Is this correct? Last resort. Gordon > 3) Assuming I do need to average the expression of the technical > replicates, are there methods in affy or limma to do this? Or do I need > to do it manually with something like: > > Data <- ReadAffy() > eset <- rma(Data) > # do something here to create a new eset where the technical replicate > columns have been replaced by a single column averaging the two > # go on to limma analysis > > Thanks, > > Jonathan > > -- > Dr Jonathan Arthur > Sesqui Lecturer in Bioinformatics > Central Clinical School, Faculty of Medicine and SUBIT > Medical Foundation Building, K25 > University of Sydney > Ph: +61 2 9036 3132 > Email: jarthur@med.usyd.edu.au

ADD COMMENT • link 20.4 years ago Gordon Smyth 53k

0

Entering edit mode

To swap the order of my original two questions: Gordon K Smyth wrote: > Does "drawn at a later time point" mean that the RNA was extracted from the > same organisim but at a later time? The source of mRNA for the microarrays are plates of bacteria cultured from clinical samples provided by (human) subjects. In most cases, one patient => one sample => one culture => one RNA extraction => one microarray. I assume each microarray is a biological replicate grouped by the clinical status of the patient (disease vs control). In one case, however, one patient => one sample => one culture => one RNA extraction => *two* microarrays. The two arrays were performed several months apart but come from the same RNA extraction (frozen during the interim). I assume these are technical replicates. In another case, one patient => one sample => *two* cultures made several months apart (sample frozen in interim) => two extractions => two microarrays. Is this a biological or technical replicate? The fact it is from the same patient/sample suggests a technical replicate, but the different culture suggests a biological replicate?? > Have you read the sections on technical replication in the Limma User's Guide? > That would be the place to start. Yes, however I am having difficultly rationalising the section on "Two Groups: Affymetrix" with the two on "Technical Replication" If I treat everything as biological replicates, using a group-means parameterization, the design I use is: > design <- cbind(disease=c(1,1,1,1,1,0,0,0,0,0,0,0),control=c(0,0,0,0 ,0,1,1,1,1,1,1,1)) Presumably, I need to do something like: corfit <- duplicateCorrelation(eset, design, ndups=1, block=c(???)) fit <- lmFit(eset, design, block=c(???), correlation=corfit$consensus) checking first to make sure corfit$consensus is positive. But I am not clear on how to define the block vector? Thanks for your help. Jonathan

ADD REPLY • link 20.4 years ago Jonathan Arthur ▴ 70

0

Entering edit mode

Thanks for the further explanation. At 12:20 PM 7/06/2005, Jonathan Arthur wrote: >To swap the order of my original two questions: > >Gordon K Smyth wrote: > >>Does "drawn at a later time point" mean that the RNA was extracted from >>the same organisim but at a later time? > >The source of mRNA for the microarrays are plates of bacteria cultured >from clinical samples provided by (human) subjects. > >In most cases, one patient => one sample => one culture => one RNA >extraction => one microarray. I assume each microarray is a biological >replicate grouped by the clinical status of the patient (disease vs control). > >In one case, however, one patient => one sample => one culture => one RNA >extraction => *two* microarrays. The two arrays were performed several >months apart but come from the same RNA extraction (frozen during the >interim). I assume these are technical replicates. Yes. >In another case, one patient => one sample => *two* cultures made several >months apart (sample frozen in interim) => two extractions => two >microarrays. Is this a biological or technical replicate? The fact it is >from the same patient/sample suggests a technical replicate, but the >different culture suggests a biological replicate?? Technical replication refers to any replication which fails to repeat all the relevant steps, so this is technical replication. However, as you've explained clearly yourself, in any multistage process there are many possible levels of technical replication. In your previous example, the variation between the technical replicates would reflect only the microarray component of variation. In this case, the variation between technical replicates reflects variation between cultures and extractions as well as the variation between microarrays. >>Have you read the sections on technical replication in the Limma User's >>Guide? >>That would be the place to start. > >Yes, however I am having difficultly rationalising the section on "Two >Groups: Affymetrix" with the two on "Technical Replication" > >If I treat everything as biological replicates, using a group-means >parameterization, the design I use is: > >>design <- >>cbind(disease=c(1,1,1,1,1,0,0,0,0,0,0,0),control=c(0,0,0,0,0,1,1,1,1 ,1,1,1)) > >Presumably, I need to do something like: > >corfit <- duplicateCorrelation(eset, design, ndups=1, block=c(???)) >fit <- lmFit(eset, design, block=c(???), correlation=corfit$consensus) > >checking first to make sure corfit$consensus is positive. > >But I am not clear on how to define the block vector? For an experiment which systematically uses both biological and technical replication, you would set block=Patient. In your experiment however you don't have enough technical replication to reliably decompose variability into biological and technical components, and the technical replication is inconsistent anyway. One approach, which you already have mentioned, is to average over your technical replicates. This will however invalidate any rigorous statistical analysis, because the averages will be less variable than the individual arrays, by an amount which is unknown, because you don't know how much technical variation you are averaging over. The simplest approach for you would be to simply choose what you think are the best arrays for the two patients for whom you have replicates, and discard the two superfluous arrays. Alternatively, there is a trick which would allow you to use all your arrays. But it requires a feature of the lmFit() function which I don't wish to publicly document yet, as it would be easy to mis-use, so I will write to you offline. Gordon >Thanks for your help. > >Jonathan

ADD REPLY • link 20.4 years ago Gordon Smyth 53k