Different levels of replicates and how to create a correct targets file out of that.

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 4.0 years ago

United States

Material relevant to this discussion can be found under the thread with subject line: technical replicates (again!): a summary At 02:21 AM 3/31/2004, Johan Lindberg wrote: >Thank you for the answer but I think that my situation is a little bit >different. First of all I wonder about the answer that was given >in https://stat.ethz.ch/pipermail/bioconductor/2003-December/003277.html >He has got 30 individuals with 4-6 replicates of each. This would mean >that 120 - 160 hybridizations have been done. The example targets file >that is given looks something like this: > >Cy3 Cy5 >Patient1 Control >Control Patient1 >Patient1 Control >Patient2 Control >Control Patient2 >... > >Here is were I get confused because it looks here as the technical >replicates are included in the targets file (on the same level as the >biological replicates) and should therefore also be included in a >following contrast matrix. But the contrast.matrix given >cont.matrix <- matrix(1,30,1) >is just a row of 30 1:s (he had 30patients in the study) witch indicates >that only the true biological replicates would be included in the B-stat >analysis??? >Back to my experiment. My real problem I think is that I have no common >reference between the different samples. In the example above he has got >this "control" used in the hybridizations. But I have hybridized a biopsy >before and then after treatment for each individual. > >Cy3 Cy5 >Patient1 before Patient1 after >Patient1 after Patient1 before >Patient2 before Patient2 after >... > >But since the effect I am looking for is the effect of the treatment, not >the between patients effect, would it be correct to use the same approach >as the given example >https://stat.ethz.ch/pipermail/bioconductor/2003-December/003277.html >even though I have no common reference? > >Another question that was not aswered is how to treat different replicates >on different levels. Since I have 1-2 biopsy taken from different >individuals plus technical replicates of each. Is there a way of dealing >with this kind of stuff in LIMMA? Should one just average over lower >levels of replicates and then just put in true biological replicates in >the targets file/contrast matrix? > >Best regards > >/ Johan Lindberg > > > > > > > > >At 10:32 2004-03-31 +1000, Gordon Smyth wrote: >>At 11:51 PM 30/03/2004, Johan Lindberg wrote: >>>Sorry, I forgot to have a subject on the mail I sent before. >>> >>>Hello everyone. >>>I would really appreciate some comments/hints/help with a pretty long >>>question. >> >>This question has been asked on the list before. See: >> >>https://stat.ethz.ch/pipermail/bioconductor/2003-December/003277.htm l >> >>The simplest treatment in limma is simply to treat your experiment as >>having two factors, one factor having 10 levels indicating the patient >>and one taking two levels, before or after. This treatment is analogous >>to a paired-test or to a two-way analysis of variance. >> >>An alternative treatment would be to treat the patients as random >>effects. That would also be a correct treatment, and potentially a little >>more powerful, but also much more difficult and I don't think you gain >>very much. >> >>>I have an experiment consisting of 18 hybridizations. On the 30K cDNA >>>arrays knee joint bioipsies (from different patients) before and after a >>>certain treatment is hybridized. What I want to find out is the effect >>>of the treatment, not the difference between the patients. The problem >>>is how to deal with different levels of replicates and how to create a >>>correct target file since I have no common reference? >>>This is how the experimental set-up looks like. >>> >>>Patient Hybridization Cy3 Cy5 >>>1 1A Biopsy 1 before >>>treatment Biopsy 1 after treatment >>> 1B Biopsy 1 after >>> treatment Biopsy 1 before treatment >>>3 2A Biopsy 1 before >>>treatment Biopsy 1 after treatment >>> 2B Biopsy 1 after >>> treatment Biopsy 1 before treatment >>> 3A Biopsy 2 before >>> treatment Biopsy 2 after treatment >>> 3B Biopsy 2 after >>> treatment Biopsy 2 before treatment >>>4 4A Biopsy 1 before >>>treatment Biopsy 1 after treatment >>> 4B Biopsy 1 after >>> treatment Biopsy 1 before treatment >>> 5A Biopsy 2 before >>> treatment Biopsy 2 after treatment >>> 5B Biopsy 2 after >>> treatment Biopsy 2 before treatment >>>5 6A Biopsy 1 before >>>treatment Biopsy 1 after treatment >>> 6B Biopsy 1 after >>> treatment Biopsy 1 before treatment >>>6 7A Biopsy 1 before >>>treatment Biopsy 1 after treatment >>> 7B Biopsy 1 after >>> treatment Biopsy 1 before treatment >>>7 8A Biopsy 1 before >>>treatment Biopsy 1 after treatment >>> 8B Biopsy 1 after >>> treatment Biopsy 1 before treatment >>>10 9A Biopsy 1 before >>>treatment Biopsy 1 after treatment >>> 9B Biopsy 1 after >>> treatment Biopsy 1 before treatment >>> >>>As you can see different patients have one or two biopsies taken from >>>them. Since I realize it would be a mistake to include all those into >>>the target file because if I have more measurements of a certain patient >>>that would bias the ranking of the B-stat towards the patient having the >>>most biopsies in the end, right? Or? >>>Since the differentially expressed genes in the patient with more >>>biopsies will get smaller variance? >>> >>>My solution to the problem was just to create an artificial Mmatrix >>>twice as long as the original MA object. For the patients with two >>>biopsies I averaged over the technical replicates (dye-swaps) and put >>>the values from biopsy one and then the values from biopsy two in the >>>matrix. From patients with just a technical replicate I put the values >>>from hybridization 1A and then hybridization 1B into the matrix. >>> >>>The M-values of that matrix object would look something like: >>> >>> patient >>> 1 patient3 .... >>>Rows 1-30000 Hybridization 1A Average of hybridization 2A and >>>2B .... >>>Rows 30001-60000 Hybridization 1B Average of hybridization >>>3A and 3B .... >>> >>>After this I plan to use dupcor on the new matrix of M-values, as if I >>>would have a slide with replicate spots on it. >>> >>>So far so good or? Is this a good way of treating replicates on >>>different levels or has anyone else some better idea of how to do this. >>>Comments please..... >>> >>> >>>And now, how to create a correct targets file since I have no common >>>reference. >>>I guess it would look something like this: >>> >>>SlideNumber Name FileName Cy3 Cy5 >>>1 pat1_p test1.gpr Before_p1 After_p1 >>>2 pat3_p test2.gpr Before_p2 After_p2 >>>3 pat4_p test3.gpr Before_p3 After_p3 >>>4 pat6_p test4.gpr Before_p4 After_p4 >>>5 pat7_p test5.gpr Before_p5 After_p5 >>>6 pat10_p test6.gpr Before_p6 After_p6 >>> >>>But when I want to make my contrast matrix I am lost since I do not have >>>anything to write as ref. >>>design <- modelMatrix(targets, ref="????????") >>> >>>If I redo the matrix to >>> >>>SlideNumber Name FileName Cy3 Cy5 >>>1 pat1_p test1.gpr Before_p After_p >>>2 pat3_p test2.gpr Before_p After_p >>>3 pat4_p test3.gpr Before_p After_p >>>4 pat6_p test4.gpr Before_p After_p >>>5 pat7_p test5.gpr Before_p After_p >>>6 pat10_p test6.gpr Before_p After_p >>> >>>wouldnt that be the same as treating this as a common reference design >>>when it is not? And wouldnt that effect the variance of the experiment? >>>How do I do this in a correct way. >>>I looked at the Zebra fish example in the LIMMA user guide but isnt that >>>wrong as well. Because technical and biological replicates are treated >>>the same way in the targets file of the zebra fish. >> >>Dye-swap pairs are not necessarily technical replicates. >> >>>I realize that many of these questions should have been considered >>>before conducting the lab part but unfortunately they were not. So I >>>will not be surprised if someone sends me the same quote as I got >>>yesterday from a friend: >>> >>>"To consult a statistician after an experiment is finished is often >>>merely to ask him to conduct a post mortem examination. He can perhaps >>>say what the experiment died of." >>>- R.A. Fisher, Presidential Address to the First Indian Statistical >>>Congress, 1938 >>> >>>Best regards >> >>Gordon >> >>>/Johan Lindberg >>> >>>_______________________________________________ >>>Bioconductor mailing list >>>Bioconductor@stat.math.ethz.ch >>>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

limma limma • 974 views

ADD COMMENT • link updated 21.0 years ago by Gordon Smyth 52k • written 21.0 years ago by Naomi Altman ★ 6.0k

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 15 hours ago

WEHI, Melbourne, Australia

Dear Johan, Now I've had a chance to read your email more thoroughly, I think you actually have a clever approach. At 11:51 PM 30/03/2004, Johan Lindberg wrote: >Sorry, I forgot to have a subject on the mail I sent before. > >Hello everyone. >I would really appreciate some comments/hints/help with a pretty long >question. > >I have an experiment consisting of 18 hybridizations. On the 30K cDNA >arrays knee joint bioipsies (from different patients) before and after a >certain treatment is hybridized. What I want to find out is the effect of >the treatment, not the difference between the patients. The problem is how >to deal with different levels of replicates and how to create a correct >target file since I have no common reference? >This is how the experimental set-up looks like. > >Patient Hybridization Cy3 Cy5 >1 1A Biopsy 1 before >treatment Biopsy 1 after treatment > 1B Biopsy 1 after > treatment Biopsy 1 before treatment >3 2A Biopsy 1 before >treatment Biopsy 1 after treatment > 2B Biopsy 1 after > treatment Biopsy 1 before treatment > 3A Biopsy 2 before > treatment Biopsy 2 after treatment > 3B Biopsy 2 after > treatment Biopsy 2 before treatment >4 4A Biopsy 1 before >treatment Biopsy 1 after treatment > 4B Biopsy 1 after > treatment Biopsy 1 before treatment > 5A Biopsy 2 before > treatment Biopsy 2 after treatment > 5B Biopsy 2 after > treatment Biopsy 2 before treatment >5 6A Biopsy 1 before >treatment Biopsy 1 after treatment > 6B Biopsy 1 after > treatment Biopsy 1 before treatment >6 7A Biopsy 1 before >treatment Biopsy 1 after treatment > 7B Biopsy 1 after > treatment Biopsy 1 before treatment >7 8A Biopsy 1 before >treatment Biopsy 1 after treatment > 8B Biopsy 1 after > treatment Biopsy 1 before treatment >10 9A Biopsy 1 before >treatment Biopsy 1 after treatment > 9B Biopsy 1 after > treatment Biopsy 1 before treatment You have an unbalanced design with three error strata: patient, biopsy, microarray. In principle one would like to treat this using a model with nested random effects but, as recent discussion has indicated, this is not so straightforward. >As you can see different patients have one or two biopsies taken from >them. Since I realize it would be a mistake to include all those into the >target file because if I have more measurements of a certain patient that >would bias the ranking of the B-stat towards the patient having the most >biopsies in the end, right? Or? >Since the differentially expressed genes in the patient with more biopsies >will get smaller variance? > >My solution to the problem was just to create an artificial Mmatrix twice >as long as the original MA object. For the patients with two biopsies I >averaged over the technical replicates (dye-swaps) and put the values from >biopsy one and then the values from biopsy two in the matrix. From >patients with just a technical replicate I put the values from >hybridization 1A and then hybridization 1B into the matrix. > >The M-values of that matrix object would look something like: > > patient > 1 patient3 .... >Rows 1-30000 Hybridization 1A Average of hybridization 2A and >2B .... >Rows 30001-60000 Hybridization 1B Average of hybridization >3A and 3B .... > >After this I plan to use dupcor on the new matrix of M-values, as if I >would have a slide with replicate spots on it. > >So far so good or? Is this a good way of treating replicates on different >levels or has anyone else some better idea of how to do this. Comments >please..... This is actually very clever. You've got rid of one error strata by averaging, then use duplicateCorrelation to handle the other. I think your approach is actually a good one *but* you need to give double weight to cases where you have averaged over two technical replicates. Use the 'weights' component of your MAList object to do this. >And now, how to create a correct targets file since I have no common >reference. >I guess it would look something like this: > >SlideNumber Name FileName Cy3 Cy5 >1 pat1_p test1.gpr Before_p1 After_p1 >2 pat3_p test2.gpr Before_p2 After_p2 >3 pat4_p test3.gpr Before_p3 After_p3 >4 pat6_p test4.gpr Before_p4 After_p4 >5 pat7_p test5.gpr Before_p5 After_p5 >6 pat10_p test6.gpr Before_p6 After_p6 > >But when I want to make my contrast matrix I am lost since I do not have >anything to write as ref. >design <- modelMatrix(targets, ref="????????") If I have understood your approach, you don't need to do anything about the targets file or the design matrix. Just use design <- rep(1,6). You now have independent M-values estimating the same thing. Gordon >If I redo the matrix to > >SlideNumber Name FileName Cy3 Cy5 >1 pat1_p test1.gpr Before_p After_p >2 pat3_p test2.gpr Before_p After_p >3 pat4_p test3.gpr Before_p After_p >4 pat6_p test4.gpr Before_p After_p >5 pat7_p test5.gpr Before_p After_p >6 pat10_p test6.gpr Before_p After_p > >wouldnt that be the same as treating this as a common reference design >when it is not? And wouldnt that effect the variance of the experiment? >How do I do this in a correct way. >I looked at the Zebra fish example in the LIMMA user guide but isnt that >wrong as well. Because technical and biological replicates are treated the >same way in the targets file of the zebra fish. > >I realize that many of these questions should have been considered before >conducting the lab part but unfortunately they were not. So I will not be >surprised if someone sends me the same quote as I got yesterday from a friend: > >"To consult a statistician after an experiment is finished is often merely >to ask him to conduct a post mortem examination. He can perhaps say what >the experiment died of." >- R.A. Fisher, Presidential Address to the First Indian Statistical >Congress, 1938 > >Best regards > >/Johan Lindberg

ADD COMMENT • link 21.0 years ago Gordon Smyth 52k

Login before adding your answer.