technical replicates

0

Entering edit mode

William Kenworthy ▴ 70

@william-kenworthy-322

Last seen 9.6 years ago

Hi, I have just been passed a set of affy data that consists of 3 states, two technical replicates of each state (6 chips overall) 1. whats the best way (normalisations, algorithms) to leverage technical replicates? 2. how do you tell algorithms such as rma which are the replicates? (I presume the phenoData in the AffyBatch specifies this, but the examples are in a binary format so you cant open the raw data file and see how they are put together!) BillK

affy affy • 1.4k views

ADD COMMENT • link updated 20.3 years ago by Naomi Altman ★ 6.0k • written 20.3 years ago by William Kenworthy ▴ 70

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.0 years ago

United States

I usually use mixed model ANOVA to determine differential expression. By putting in a random effects term for sample (you have 2 technical reps per sample) you end up with the correct analysis. However, in your case you have no sample replication. As a result, you cannot do a statistically valid ANOVA. People do use various approximations - the Affy-type Wilcoxon tests which treat the probes as replicates (makes me very very nervous); using the technical replicates as if they were biological replicates (makes me very nervous - there is no guarantee that there is much relationship between the technical and biological variation). Of course, the best thing to do is to convince the investigators that biological replication is far more important than technical replication. In most Affy studies, technical replication will not be cost-effective due to the relatively small technical variance and the large cost of individual arrays. At 01:18 AM 1/9/2004, William Kenworthy wrote: >Hi, I have just been passed a set of affy data that consists of 3 >states, two technical replicates of each state (6 chips overall) > >1. whats the best way (normalisations, algorithms) to leverage technical >replicates? > >2. how do you tell algorithms such as rma which are the replicates? (I >presume the phenoData in the AffyBatch specifies this, but the examples >are in a binary format so you cant open the raw data file and see how >they are put together!) > >BillK > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 20.3 years ago Naomi Altman ★ 6.0k

0

Entering edit mode

Unfortunately, they came to us too late and were forced by circumstances to complete the experiment as designed. Next time ... I am thinking a straight average of the replicate values for each gene may the best solution (not per probe). However, a significance value may show up a suspect gene (if a difference exists across the replicates, something is wrong in a yes/no fashion, rather than looking at degrees of significance) My preference is to use RMA from the affy package as a first pass, then expand using some of the other algorithms but concrete examples (and documentation) on how to specify replicates is lacking. (i.e., the data examples say there are replicates, but how are they specified/handled - or not?) Billk On Sun, 2004-01-11 at 23:03, Naomi Altman wrote: > I usually use mixed model ANOVA to determine differential expression. By > putting in a random effects term for sample (you have 2 technical reps per > sample) you end up with the correct analysis. > > However, in your case you have no sample replication. As a result, you > cannot do a statistically valid ANOVA. People do use various > approximations - the Affy-type Wilcoxon tests which treat the probes as > replicates (makes me very very nervous); using the technical replicates as > if they were biological replicates (makes me very nervous - there is no > guarantee that there is much relationship between the technical and > biological variation). Of course, the best thing to do is to convince the > investigators that biological replication is far more important than > technical replication. In most Affy studies, technical replication will > not be cost-effective due to the relatively small technical variance and > the large cost of individual arrays. > > At 01:18 AM 1/9/2004, William Kenworthy wrote: > >Hi, I have just been passed a set of affy data that consists of 3 > >states, two technical replicates of each state (6 chips overall) > > > >1. whats the best way (normalisations, algorithms) to leverage technical > >replicates? > > > >2. how do you tell algorithms such as rma which are the replicates? (I > >presume the phenoData in the AffyBatch specifies this, but the examples > >are in a binary format so you cant open the raw data file and see how > >they are put together!) > > > >BillK > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor@stat.math.ethz.ch > >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > Naomi S. Altman 814-865-3791 (voice) > Associate Professor > Bioinformatics Consulting Center > Dept. of Statistics 814-863-7114 (fax) > Penn State University 814-865-1348 (Statistics) > University Park, PA 16802-2111

ADD REPLY • link 20.3 years ago William Kenworthy ▴ 70

0

Entering edit mode

Dear Bill, Perhaps I have not understood your problem. You could normalize your data as suggested. After that, if you want to get some type of significance value, you do a two-sample test such as a t-test or Wilcoxon test. To date I am using a gene by gene analysis, by applying the function to each row of the expression matrix. (see affy command "expr and R commands "apply", t.test and Wilcox.test) There are probably more efficient ways to do this. This can also be done in limma, using the design (1 1 1 -1 -1 -1) assuming the first 3 arrays come from one condition and the other 3 from the other. To detect outliers, I usually do MvA plots or array x array expression plots (on the log scale) i.e. array x array plots would be the 3 plots arising from all pairs of arrays from the same treatment, with expression from array i on the x-axis and expression from array j on the y-axis. Outliers are points far from the diagonal. MvA plots are the 3 plots arising from all pairs of arrays from the same treatment. The x-axis is the expression value averaged over the arrays, and the y-axis is the difference in expression values between the two arrays. This generally looks like a sideways raindrop, with the wide end near the origin. Again, points away from the point cloud are outliers. --Naomi At 09:15 PM 1/12/2004, William Kenworthy wrote: >Unfortunately, they came to us too late and were forced by circumstances >to complete the experiment as designed. Next time ... > >I am thinking a straight average of the replicate values for each gene >may the best solution (not per probe). However, a significance value >may show up a suspect gene (if a difference exists across the >replicates, something is wrong in a yes/no fashion, rather than looking >at degrees of significance) > >My preference is to use RMA from the affy package as a first pass, then >expand using some of the other algorithms but concrete examples (and >documentation) on how to specify replicates is lacking. (i.e., the data >examples say there are replicates, but how are they specified/handled - >or not?) > >Billk > > >On Sun, 2004-01-11 at 23:03, Naomi Altman wrote: > > I usually use mixed model ANOVA to determine differential expression. By > > putting in a random effects term for sample (you have 2 technical reps per > > sample) you end up with the correct analysis. > > > > However, in your case you have no sample replication. As a result, you > > cannot do a statistically valid ANOVA. People do use various > > approximations - the Affy-type Wilcoxon tests which treat the probes as > > replicates (makes me very very nervous); using the technical replicates as > > if they were biological replicates (makes me very nervous - there is no > > guarantee that there is much relationship between the technical and > > biological variation). Of course, the best thing to do is to convince the > > investigators that biological replication is far more important than > > technical replication. In most Affy studies, technical replication will > > not be cost-effective due to the relatively small technical variance and > > the large cost of individual arrays. > > > > At 01:18 AM 1/9/2004, William Kenworthy wrote: > > >Hi, I have just been passed a set of affy data that consists of 3 > > >states, two technical replicates of each state (6 chips overall) > > > > > >1. whats the best way (normalisations, algorithms) to leverage technical > > >replicates? > > > > > >2. how do you tell algorithms such as rma which are the replicates? (I > > >presume the phenoData in the AffyBatch specifies this, but the examples > > >are in a binary format so you cant open the raw data file and see how > > >they are put together!) > > > > > >BillK > > > > > >_______________________________________________ > > >Bioconductor mailing list > > >Bioconductor@stat.math.ethz.ch > > >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > > > Naomi S. Altman 814-865-3791 (voice) > > Associate Professor > > Bioinformatics Consulting Center > > Dept. of Statistics 814-863-7114 (fax) > > Penn State University 814-865-1348 (Statistics) > > University Park, PA 16802-2111 Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD REPLY • link 20.3 years ago Naomi Altman ★ 6.0k

Login before adding your answer.