Different levels of replicates and how to create a correct targets file out of that.

0

Entering edit mode

Johan Lindberg ▴ 90

@johan-lindberg-581

Last seen 9.6 years ago

Sorry, I forgot to have a subject on the mail I sent before. Hello everyone. I would really appreciate some comments/hints/help with a pretty long question. I have an experiment consisting of 18 hybridizations. On the 30K cDNA arrays knee joint bioipsies (from different patients) before and after a certain treatment is hybridized. What I want to find out is the effect of the treatment, not the difference between the patients. The problem is how to deal with different levels of replicates and how to create a correct target file since I have no common reference? This is how the experimental set-up looks like. Patient Hybridization Cy3 Cy5 1 1A Biopsy 1 before treatment Biopsy 1 after treatment 1B Biopsy 1 after treatment Biopsy 1 before treatment 3 2A Biopsy 1 before treatment Biopsy 1 after treatment 2B Biopsy 1 after treatment Biopsy 1 before treatment 3A Biopsy 2 before treatment Biopsy 2 after treatment 3B Biopsy 2 after treatment Biopsy 2 before treatment 4 4A Biopsy 1 before treatment Biopsy 1 after treatment 4B Biopsy 1 after treatment Biopsy 1 before treatment 5A Biopsy 2 before treatment Biopsy 2 after treatment 5B Biopsy 2 after treatment Biopsy 2 before treatment 5 6A Biopsy 1 before treatment Biopsy 1 after treatment 6B Biopsy 1 after treatment Biopsy 1 before treatment 6 7A Biopsy 1 before treatment Biopsy 1 after treatment 7B Biopsy 1 after treatment Biopsy 1 before treatment 7 8A Biopsy 1 before treatment Biopsy 1 after treatment 8B Biopsy 1 after treatment Biopsy 1 before treatment 10 9A Biopsy 1 before treatment Biopsy 1 after treatment 9B Biopsy 1 after treatment Biopsy 1 before treatment As you can see different patients have one or two biopsies taken from them. Since I realize it would be a mistake to include all those into the target file because if I have more measurements of a certain patient that would bias the ranking of the B-stat towards the patient having the most biopsies in the end, right? Or? Since the differentially expressed genes in the patient with more biopsies will get smaller variance? My solution to the problem was just to create an artificial Mmatrix twice as long as the original MA object. For the patients with two biopsies I averaged over the technical replicates (dye-swaps) and put the values from biopsy one and then the values from biopsy two in the matrix. From patients with just a technical replicate I put the values from hybridization 1A and then hybridization 1B into the matrix. The M-values of that matrix object would look something like: patient 1 patient3 .... Rows 1-30000 Hybridization 1A Average of hybridization 2A and 2B .... Rows 30001-60000 Hybridization 1B Average of hybridization 3A and 3B .... After this I plan to use dupcor on the new matrix of M-values, as if I would have a slide with replicate spots on it. So far so good or? Is this a good way of treating replicates on different levels or has anyone else some better idea of how to do this. Comments please..... And now, how to create a correct targets file since I have no common reference. I guess it would look something like this: SlideNumber Name FileName Cy3 Cy5 1 pat1_p test1.gpr Before_p1 After_p1 2 pat3_p test2.gpr Before_p2 After_p2 3 pat4_p test3.gpr Before_p3 After_p3 4 pat6_p test4.gpr Before_p4 After_p4 5 pat7_p test5.gpr Before_p5 After_p5 6 pat10_p test6.gpr Before_p6 After_p6 But when I want to make my contrast matrix I am lost since I do not have anything to write as ref. design <- modelMatrix(targets, ref="????????") If I redo the matrix to SlideNumber Name FileName Cy3 Cy5 1 pat1_p test1.gpr Before_p After_p 2 pat3_p test2.gpr Before_p After_p 3 pat4_p test3.gpr Before_p After_p 4 pat6_p test4.gpr Before_p After_p 5 pat7_p test5.gpr Before_p After_p 6 pat10_p test6.gpr Before_p After_p wouldnt that be the same as treating this as a common reference design when it is not? And wouldnt that effect the variance of the experiment? How do I do this in a correct way. I looked at the Zebra fish example in the LIMMA user guide but isnt that wrong as well. Because technical and biological replicates are treated the same way in the targets file of the zebra fish. I realize that many of these questions should have been considered before conducting the lab part but unfortunately they were not. So I will not be surprised if someone sends me the same quote as I got yesterday from a friend: "To consult a statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of." - R.A. Fisher, Presidential Address to the First Indian Statistical Congress, 1938 Best regards /Johan Lindberg

limma limma • 982 views

ADD COMMENT • link updated 20.1 years ago by Gordon Smyth 50k • written 20.1 years ago by Johan Lindberg ▴ 90

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 4 hours ago

WEHI, Melbourne, Australia

At 11:51 PM 30/03/2004, Johan Lindberg wrote: >Sorry, I forgot to have a subject on the mail I sent before. > >Hello everyone. >I would really appreciate some comments/hints/help with a pretty long >question. This question has been asked on the list before. See: https://stat.ethz.ch/pipermail/bioconductor/2003-December/003277.html The simplest treatment in limma is simply to treat your experiment as having two factors, one factor having 10 levels indicating the patient and one taking two levels, before or after. This treatment is analogous to a paired-test or to a two-way analysis of variance. An alternative treatment would be to treat the patients as random effects. That would also be a correct treatment, and potentially a little more powerful, but also much more difficult and I don't think you gain very much. >I have an experiment consisting of 18 hybridizations. On the 30K cDNA >arrays knee joint bioipsies (from different patients) before and after a >certain treatment is hybridized. What I want to find out is the effect of >the treatment, not the difference between the patients. The problem is how >to deal with different levels of replicates and how to create a correct >target file since I have no common reference? >This is how the experimental set-up looks like. > >Patient Hybridization Cy3 Cy5 >1 1A Biopsy 1 before >treatment Biopsy 1 after treatment > 1B Biopsy 1 after > treatment Biopsy 1 before treatment >3 2A Biopsy 1 before >treatment Biopsy 1 after treatment > 2B Biopsy 1 after > treatment Biopsy 1 before treatment > 3A Biopsy 2 before > treatment Biopsy 2 after treatment > 3B Biopsy 2 after > treatment Biopsy 2 before treatment >4 4A Biopsy 1 before >treatment Biopsy 1 after treatment > 4B Biopsy 1 after > treatment Biopsy 1 before treatment > 5A Biopsy 2 before > treatment Biopsy 2 after treatment > 5B Biopsy 2 after > treatment Biopsy 2 before treatment >5 6A Biopsy 1 before >treatment Biopsy 1 after treatment > 6B Biopsy 1 after > treatment Biopsy 1 before treatment >6 7A Biopsy 1 before >treatment Biopsy 1 after treatment > 7B Biopsy 1 after > treatment Biopsy 1 before treatment >7 8A Biopsy 1 before >treatment Biopsy 1 after treatment > 8B Biopsy 1 after > treatment Biopsy 1 before treatment >10 9A Biopsy 1 before >treatment Biopsy 1 after treatment > 9B Biopsy 1 after > treatment Biopsy 1 before treatment > >As you can see different patients have one or two biopsies taken from >them. Since I realize it would be a mistake to include all those into the >target file because if I have more measurements of a certain patient that >would bias the ranking of the B-stat towards the patient having the most >biopsies in the end, right? Or? >Since the differentially expressed genes in the patient with more biopsies >will get smaller variance? > >My solution to the problem was just to create an artificial Mmatrix twice >as long as the original MA object. For the patients with two biopsies I >averaged over the technical replicates (dye-swaps) and put the values from >biopsy one and then the values from biopsy two in the matrix. From >patients with just a technical replicate I put the values from >hybridization 1A and then hybridization 1B into the matrix. > >The M-values of that matrix object would look something like: > > patient > 1 patient3 .... >Rows 1-30000 Hybridization 1A Average of hybridization 2A and >2B .... >Rows 30001-60000 Hybridization 1B Average of hybridization >3A and 3B .... > >After this I plan to use dupcor on the new matrix of M-values, as if I >would have a slide with replicate spots on it. > >So far so good or? Is this a good way of treating replicates on different >levels or has anyone else some better idea of how to do this. Comments >please..... > > >And now, how to create a correct targets file since I have no common >reference. >I guess it would look something like this: > >SlideNumber Name FileName Cy3 Cy5 >1 pat1_p test1.gpr Before_p1 After_p1 >2 pat3_p test2.gpr Before_p2 After_p2 >3 pat4_p test3.gpr Before_p3 After_p3 >4 pat6_p test4.gpr Before_p4 After_p4 >5 pat7_p test5.gpr Before_p5 After_p5 >6 pat10_p test6.gpr Before_p6 After_p6 > >But when I want to make my contrast matrix I am lost since I do not have >anything to write as ref. >design <- modelMatrix(targets, ref="????????") > >If I redo the matrix to > >SlideNumber Name FileName Cy3 Cy5 >1 pat1_p test1.gpr Before_p After_p >2 pat3_p test2.gpr Before_p After_p >3 pat4_p test3.gpr Before_p After_p >4 pat6_p test4.gpr Before_p After_p >5 pat7_p test5.gpr Before_p After_p >6 pat10_p test6.gpr Before_p After_p > >wouldnt that be the same as treating this as a common reference design >when it is not? And wouldnt that effect the variance of the experiment? >How do I do this in a correct way. >I looked at the Zebra fish example in the LIMMA user guide but isnt that >wrong as well. Because technical and biological replicates are treated the >same way in the targets file of the zebra fish. Dye-swap pairs are not necessarily technical replicates. >I realize that many of these questions should have been considered before >conducting the lab part but unfortunately they were not. So I will not be >surprised if someone sends me the same quote as I got yesterday from a friend: > >"To consult a statistician after an experiment is finished is often merely >to ask him to conduct a post mortem examination. He can perhaps say what >the experiment died of." >- R.A. Fisher, Presidential Address to the First Indian Statistical >Congress, 1938 > >Best regards Gordon >/Johan Lindberg > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

ADD COMMENT • link 20.1 years ago Gordon Smyth 50k

0

Entering edit mode

Thank you for the answer but I think that my situation is a little bit different. First of all I wonder about the answer that was given in https://stat.ethz.ch/pipermail/bioconductor/2003-December/003277.html He has got 30 individuals with 4-6 replicates of each. This would mean that 120 - 160 hybridizations have been done. The example targets file that is given looks something like this: Cy3 Cy5 Patient1 Control Control Patient1 Patient1 Control Patient2 Control Control Patient2 ... Here is were I get confused because it looks here as the technical replicates are included in the targets file (on the same level as the biological replicates) and should therefore also be included in a following contrast matrix. But the contrast.matrix given cont.matrix <- matrix(1,30,1) is just a row of 30 1:s (he had 30patients in the study) witch indicates that only the true biological replicates would be included in the B-stat analysis??? Back to my experiment. My real problem I think is that I have no common reference between the different samples. In the example above he has got this "control" used in the hybridizations. But I have hybridized a biopsy before and then after treatment for each individual. Cy3 Cy5 Patient1 before Patient1 after Patient1 after Patient1 before Patient2 before Patient2 after ... But since the effect I am looking for is the effect of the treatment, not the between patients effect, would it be correct to use the same approach as the given example https://stat.ethz.ch/pipermail/bioconductor/2003-December/003277.html even though I have no common reference? Another question that was not aswered is how to treat different replicates on different levels. Since I have 1-2 biopsy taken from different individuals plus technical replicates of each. Is there a way of dealing with this kind of stuff in LIMMA? Should one just average over lower levels of replicates and then just put in true biological replicates in the targets file/contrast matrix? Best regards / Johan Lindberg At 10:32 2004-03-31 +1000, Gordon Smyth wrote: >At 11:51 PM 30/03/2004, Johan Lindberg wrote: >>Sorry, I forgot to have a subject on the mail I sent before. >> >>Hello everyone. >>I would really appreciate some comments/hints/help with a pretty long >>question. > >This question has been asked on the list before. See: > >https://stat.ethz.ch/pipermail/bioconductor/2003-December/003277.html > >The simplest treatment in limma is simply to treat your experiment as >having two factors, one factor having 10 levels indicating the patient and >one taking two levels, before or after. This treatment is analogous to a >paired-test or to a two-way analysis of variance. > >An alternative treatment would be to treat the patients as random effects. >That would also be a correct treatment, and potentially a little more >powerful, but also much more difficult and I don't think you gain very much. > >>I have an experiment consisting of 18 hybridizations. On the 30K cDNA >>arrays knee joint bioipsies (from different patients) before and after a >>certain treatment is hybridized. What I want to find out is the effect of >>the treatment, not the difference between the patients. The problem is >>how to deal with different levels of replicates and how to create a >>correct target file since I have no common reference? >>This is how the experimental set-up looks like. >> >>Patient Hybridization Cy3 Cy5 >>1 1A Biopsy 1 before >>treatment Biopsy 1 after treatment >> 1B Biopsy 1 after >> treatment Biopsy 1 before treatment >>3 2A Biopsy 1 before >>treatment Biopsy 1 after treatment >> 2B Biopsy 1 after >> treatment Biopsy 1 before treatment >> 3A Biopsy 2 before >> treatment Biopsy 2 after treatment >> 3B Biopsy 2 after >> treatment Biopsy 2 before treatment >>4 4A Biopsy 1 before >>treatment Biopsy 1 after treatment >> 4B Biopsy 1 after >> treatment Biopsy 1 before treatment >> 5A Biopsy 2 before >> treatment Biopsy 2 after treatment >> 5B Biopsy 2 after >> treatment Biopsy 2 before treatment >>5 6A Biopsy 1 before >>treatment Biopsy 1 after treatment >> 6B Biopsy 1 after >> treatment Biopsy 1 before treatment >>6 7A Biopsy 1 before >>treatment Biopsy 1 after treatment >> 7B Biopsy 1 after >> treatment Biopsy 1 before treatment >>7 8A Biopsy 1 before >>treatment Biopsy 1 after treatment >> 8B Biopsy 1 after >> treatment Biopsy 1 before treatment >>10 9A Biopsy 1 before >>treatment Biopsy 1 after treatment >> 9B Biopsy 1 after >> treatment Biopsy 1 before treatment >> >>As you can see different patients have one or two biopsies taken from >>them. Since I realize it would be a mistake to include all those into the >>target file because if I have more measurements of a certain patient that >>would bias the ranking of the B-stat towards the patient having the most >>biopsies in the end, right? Or? >>Since the differentially expressed genes in the patient with more >>biopsies will get smaller variance? >> >>My solution to the problem was just to create an artificial Mmatrix twice >>as long as the original MA object. For the patients with two biopsies I >>averaged over the technical replicates (dye-swaps) and put the values >>from biopsy one and then the values from biopsy two in the matrix. From >>patients with just a technical replicate I put the values from >>hybridization 1A and then hybridization 1B into the matrix. >> >>The M-values of that matrix object would look something like: >> >> patient >> 1 patient3 .... >>Rows 1-30000 Hybridization 1A Average of hybridization 2A and >>2B .... >>Rows 30001-60000 Hybridization 1B Average of hybridization >>3A and 3B .... >> >>After this I plan to use dupcor on the new matrix of M-values, as if I >>would have a slide with replicate spots on it. >> >>So far so good or? Is this a good way of treating replicates on different >>levels or has anyone else some better idea of how to do this. Comments >>please..... >> >> >>And now, how to create a correct targets file since I have no common >>reference. >>I guess it would look something like this: >> >>SlideNumber Name FileName Cy3 Cy5 >>1 pat1_p test1.gpr Before_p1 After_p1 >>2 pat3_p test2.gpr Before_p2 After_p2 >>3 pat4_p test3.gpr Before_p3 After_p3 >>4 pat6_p test4.gpr Before_p4 After_p4 >>5 pat7_p test5.gpr Before_p5 After_p5 >>6 pat10_p test6.gpr Before_p6 After_p6 >> >>But when I want to make my contrast matrix I am lost since I do not have >>anything to write as ref. >>design <- modelMatrix(targets, ref="????????") >> >>If I redo the matrix to >> >>SlideNumber Name FileName Cy3 Cy5 >>1 pat1_p test1.gpr Before_p After_p >>2 pat3_p test2.gpr Before_p After_p >>3 pat4_p test3.gpr Before_p After_p >>4 pat6_p test4.gpr Before_p After_p >>5 pat7_p test5.gpr Before_p After_p >>6 pat10_p test6.gpr Before_p After_p >> >>wouldnt that be the same as treating this as a common reference design >>when it is not? And wouldnt that effect the variance of the experiment? >>How do I do this in a correct way. >>I looked at the Zebra fish example in the LIMMA user guide but isnt that >>wrong as well. Because technical and biological replicates are treated >>the same way in the targets file of the zebra fish. > >Dye-swap pairs are not necessarily technical replicates. > >>I realize that many of these questions should have been considered before >>conducting the lab part but unfortunately they were not. So I will not be >>surprised if someone sends me the same quote as I got yesterday from a friend: >> >>"To consult a statistician after an experiment is finished is often >>merely to ask him to conduct a post mortem examination. He can perhaps >>say what the experiment died of." >>- R.A. Fisher, Presidential Address to the First Indian Statistical >>Congress, 1938 >> >>Best regards > >Gordon > >>/Johan Lindberg >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor@stat.math.ethz.ch >>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

ADD REPLY • link 20.1 years ago Johan Lindberg ▴ 90

Login before adding your answer.