Illumina Methylation. Normalization and statistics
3
0
Entering edit mode
@michael-walter-3141
Last seen 10.2 years ago
Dear List, We run our first slide of illumina's infinium methylation arrays. After searching the archive, I still have some general questions how to best analyze the data. First of all, I'm would like to know some opinion on normalization. In my personal and probably simplistic view I'd think that normalization is not necessary since the value you get from the array is a ratio which is sample inherent (unlike a classical two-color expression array where you mix two samples to generate the expression ratio). Is this assumption correct or am I missing some important aspect? Anyway, I'd like to perform background normalization which results as usual with illumina arrays in some negative values. Does anyone one have a neat solution for this problem or shall I just skip the probes? Do I have to correct for some dye effect like for the golden gate methylation assay? Since the probes for methylated and unmethylated DNA incorporate the same dye this shouldn't be an issue? My final question is basically the most pressing: What kind of statistic test should I use? Since all the values are ratios between 0 and 1 I have a real bad feeling by simply running some t-tests. And if a t-test is the proper choice, shall I log-transform the data? Any input and shared experience with this type of array is highly appreciated. Best Regards, Mike -- Dr. Michael Walter The Microarray Facility University of Tuebingen Calwerstr. 7 72076 T?bingen/GERMANY Tel.: +49 (0) 7071 29 83210 Fax. + 49 (0) 7071 29 5228 Confidentiality Note:\ This message is intended only for...{{dropped:9}}
Microarray Normalization Microarray Normalization • 1.6k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Wed, Nov 19, 2008 at 10:18 AM, Michael Walter < michael.walter@med.uni-tuebingen.de> wrote: > Dear List, > > We run our first slide of illumina's infinium methylation arrays. After > searching the archive, I still have some general questions how to best > analyze the data. > > First of all, I'm would like to know some opinion on normalization. In my > personal and probably simplistic view I'd think that normalization is not > necessary since the value you get from the array is a ratio which is sample > inherent (unlike a classical two-color expression array where you mix two > samples to generate the expression ratio). Is this assumption correct or am > I missing some important aspect? > Unfortunately, there is a significant dye-bias issue. That is, there is a propensity for one dye to be brighter than the other and it appears that Illumina does not adequately correct for this bias. > > Anyway, I'd like to perform background normalization which results as usual > with illumina arrays in some negative values. Does anyone one have a neat > solution for this problem or shall I just skip the probes? > I have been just ignoring those probes. > > Do I have to correct for some dye effect like for the golden gate > methylation assay? Since the probes for methylated and unmethylated DNA > incorporate the same dye this shouldn't be an issue? > See above. > > My final question is basically the most pressing: What kind of statistic > test should I use? Since all the values are ratios between 0 and 1 I have a > real bad feeling by simply running some t-tests. And if a t-test is the > proper choice, shall I log-transform the data? > The t-statistic should still be valid, I think. The assumptions that go into statistics like the t-stat are not based on the distribution of the data, but on differences between values. I think these assumption probably still holds in practice for these data. However, I have not tried to prove things one way or the other. Of course, if you are concerned about, non-parametric testing will alleviate these concerns. Sean > > Any input and shared experience with this type of array is highly > appreciated. > > > Best Regards, > > > Mike > -- > Dr. Michael Walter > > The Microarray Facility > University of Tuebingen > Calwerstr. 7 > 72076 Tübingen/GERMANY > > Tel.: +49 (0) 7071 29 83210 > Fax. + 49 (0) 7071 29 5228 > > Confidentiality Note:\ This message is intended only for...{{dropped:9}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Ed Schwalbe ▴ 30
@ed-schwalbe-2495
Last seen 10.2 years ago
Sean Davis <sdavis2 at="" ...=""> writes: > > On Wed, Nov 19, 2008 at 10:18 AM, Michael Walter < > michael.walter <at> med.uni-tuebingen.de> wrote: > > > Dear List, > > > > We run our first slide of illumina's infinium methylation arrays. After > > searching the archive, I still have some general questions how to best > > analyze the data. > > > > First of all, I'm would like to know some opinion on normalization. In my > > personal and probably simplistic view I'd think that normalization is not > > necessary since the value you get from the array is a ratio which is sample > > inherent (unlike a classical two-color expression array where you mix two > > samples to generate the expression ratio). Is this assumption correct or am > > I missing some important aspect? > > > > Unfortunately, there is a significant dye-bias issue. That is, there is a > propensity for one dye to be brighter than the other and it appears that > Illumina does not adequately correct for this bias. > > > I agree, the dye-bias should be a problem, but in my case, confirmatory bisulfite sequencing of interesting probes reported methylation values very close to those from the methylation array, so I stopped worrying about this. > > Anyway, I'd like to perform background normalization which results as usual > > with illumina arrays in some negative values. Does anyone one have a neat > > solution for this problem or shall I just skip the probes? > > > > I have been just ignoring those probes. > > > > > Do I have to correct for some dye effect like for the golden gate > > methylation assay? Since the probes for methylated and unmethylated DNA > > incorporate the same dye this shouldn't be an issue? > > > > See above. > > > > > My final question is basically the most pressing: What kind of statistic > > test should I use? Since all the values are ratios between 0 and 1 I have a > > real bad feeling by simply running some t-tests. And if a t-test is the > > proper choice, shall I log-transform the data? > > > > The t-statistic should still be valid, I think. The assumptions that go > into statistics like the t-stat are not based on the distribution of the > data, but on differences between values. I think these assumption probably > still holds in practice for these data. However, I have not tried to prove > things one way or the other. Of course, if you are concerned about, > non-parametric testing will alleviate these concerns. > > Sean > I too had the same dilemma over which statistic to use with the GoldenGate Methylation array. As I understand it, for a t test to be valid, the underlying population has to be normally distributed and this is manifestly not the case for the majority of probes, at least on the GoldenGate methylation array, not least because, as you say, the distribution is constrained between values of 0 and 1, with most probes being unmethylated (close to 0) or methylated (close to 1). The distribution is best described by a beta distribution or a mix of beta distributions, depending on whether probe distribution is uni- or bi- modal. Therefore, I used a Mann-Whitney test followed by filtering on the basis of the magnitude of difference in methylation value between clusters to identify interesting probes. Having said that, carrying out t tests did identify essentially the same set of interesting probes... Good luck with your analysis, and I'd be interested in hearing whether people agree/disagree with the t test question. Best wishes, Ed Schwalbe > > > > Any input and shared experience with this type of array is highly > > appreciated. > > > > > > Best Regards, > > > > > > Mike > > -- > > Dr. Michael Walter
ADD COMMENT
0
Entering edit mode
@michael-walter-3141
Last seen 10.2 years ago
Dear Sean, Thanks for your answers. I have one array probed with fully methylated DNA purchased by ZYMO. Here all beta values should be 1, which they aren't, of course. Can I use these values to normalize the rest of my arrays? Let assume my fully methylated value is 0.7 and my actual value is 0.5 then I would correct my beta to 0.5/0.7? best Regards, Mike > -----Urspr?ngliche Nachricht----- > Von: "Sean Davis" <sdavis2 at="" mail.nih.gov=""> > Gesendet: 19.11.08 16:30:15 > An: "Michael Walter" <michael.walter at="" med.uni-tuebingen.de=""> > CC: bioconductor at stat.math.ethz.ch > Betreff: Re: [BioC] Illumina Methylation. Normalization and statistics > On Wed, Nov 19, 2008 at 10:18 AM, Michael Walter < > michael.walter at med.uni-tuebingen.de> wrote: > > > Dear List, > > > > We run our first slide of illumina's infinium methylation arrays. After > > searching the archive, I still have some general questions how to best > > analyze the data. > > > > First of all, I'm would like to know some opinion on normalization. In my > > personal and probably simplistic view I'd think that normalization is not > > necessary since the value you get from the array is a ratio which is sample > > inherent (unlike a classical two-color expression array where you mix two > > samples to generate the expression ratio). Is this assumption correct or am > > I missing some important aspect? > > > > Unfortunately, there is a significant dye-bias issue. That is, there is a > propensity for one dye to be brighter than the other and it appears that > Illumina does not adequately correct for this bias. > > > > > > Anyway, I'd like to perform background normalization which results as usual > > with illumina arrays in some negative values. Does anyone one have a neat > > solution for this problem or shall I just skip the probes? > > > > I have been just ignoring those probes. > > > > > > Do I have to correct for some dye effect like for the golden gate > > methylation assay? Since the probes for methylated and unmethylated DNA > > incorporate the same dye this shouldn't be an issue? > > > > See above. > > > > > > My final question is basically the most pressing: What kind of statistic > > test should I use? Since all the values are ratios between 0 and 1 I have a > > real bad feeling by simply running some t-tests. And if a t-test is the > > proper choice, shall I log-transform the data? > > > > The t-statistic should still be valid, I think. The assumptions that go > into statistics like the t-stat are not based on the distribution of the > data, but on differences between values. I think these assumption probably > still holds in practice for these data. However, I have not tried to prove > things one way or the other. Of course, if you are concerned about, > non-parametric testing will alleviate these concerns. > > Sean > > > > > > Any input and shared experience with this type of array is highly > > appreciated. > > > > > > Best Regards, > > > > > > Mike > > -- > > Dr. Michael Walter > > > > The Microarray Facility > > University of Tuebingen > > Calwerstr. 7 > > 72076 T??bingen/GERMANY > > > > Tel.: +49 (0) 7071 29 83210 > > Fax. + 49 (0) 7071 29 5228 > > > > Confidentiality Note:\ This message is intended only for...{{dropped:9}} > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]] > > >
> _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Dr. Michael Walter The Microarray Facility University of Tuebingen Calwerstr. 7 72076 T?bingen/GERMANY Tel.: +49 (0) 7071 29 83210 Fax. + 49 (0) 7071 29 5228 Confidentiality Note:\ This message is intended only for...{{dropped:9}}
ADD COMMENT
0
Entering edit mode
On Thu, Nov 20, 2008 at 8:57 AM, Michael Walter < michael.walter@med.uni-tuebingen.de> wrote: > Dear Sean, > > Thanks for your answers. I have one array probed with fully methylated DNA > purchased by ZYMO. Here all beta values should be 1, which they aren't, of > course. Can I use these values to normalize the rest of my arrays? Let > assume my fully methylated value is 0.7 and my actual value is 0.5 then I > would correct my beta to 0.5/0.7? > Hi, Mike. That might work, but the approach we have taken is to correct the intensities which show a significant dye bias, so a significant bias in beta. Sean > > > > -----Ursprüngliche Nachricht----- > > Von: "Sean Davis" <sdavis2@mail.nih.gov> > > Gesendet: 19.11.08 16:30:15 > > An: "Michael Walter" <michael.walter@med.uni-tuebingen.de> > > CC: bioconductor@stat.math.ethz.ch > > Betreff: Re: [BioC] Illumina Methylation. Normalization and statistics > > > > On Wed, Nov 19, 2008 at 10:18 AM, Michael Walter < > > michael.walter@med.uni-tuebingen.de> wrote: > > > > > Dear List, > > > > > > We run our first slide of illumina's infinium methylation arrays. After > > > searching the archive, I still have some general questions how to best > > > analyze the data. > > > > > > First of all, I'm would like to know some opinion on normalization. In > my > > > personal and probably simplistic view I'd think that normalization is > not > > > necessary since the value you get from the array is a ratio which is > sample > > > inherent (unlike a classical two-color expression array where you mix > two > > > samples to generate the expression ratio). Is this assumption correct > or am > > > I missing some important aspect? > > > > > > > Unfortunately, there is a significant dye-bias issue. That is, there is > a > > propensity for one dye to be brighter than the other and it appears that > > Illumina does not adequately correct for this bias. > > > > > > > > > > Anyway, I'd like to perform background normalization which results as > usual > > > with illumina arrays in some negative values. Does anyone one have a > neat > > > solution for this problem or shall I just skip the probes? > > > > > > > I have been just ignoring those probes. > > > > > > > > > > Do I have to correct for some dye effect like for the golden gate > > > methylation assay? Since the probes for methylated and unmethylated DNA > > > incorporate the same dye this shouldn't be an issue? > > > > > > > See above. > > > > > > > > > > My final question is basically the most pressing: What kind of > statistic > > > test should I use? Since all the values are ratios between 0 and 1 I > have a > > > real bad feeling by simply running some t-tests. And if a t-test is the > > > proper choice, shall I log-transform the data? > > > > > > > The t-statistic should still be valid, I think. The assumptions that go > > into statistics like the t-stat are not based on the distribution of the > > data, but on differences between values. I think these assumption > probably > > still holds in practice for these data. However, I have not tried to > prove > > things one way or the other. Of course, if you are concerned about, > > non-parametric testing will alleviate these concerns. > > > > Sean > > > > > > > > > > Any input and shared experience with this type of array is highly > > > appreciated. > > > > > > > > > Best Regards, > > > > > > > > > Mike > > > -- > > > Dr. Michael Walter > > > > > > The Microarray Facility > > > University of Tuebingen > > > Calwerstr. 7 > > > 72076 TÃŒbingen/GERMANY > > > > > > Tel.: +49 (0) 7071 29 83210 > > > Fax. + 49 (0) 7071 29 5228 > > > > > > Confidentiality Note:\ This message is intended only > for...{{dropped:9}} > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor@stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > [[alternative HTML version deleted]] > > > > > >
> > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > Dr. Michael Walter > > The Microarray Facility > University of Tuebingen > Calwerstr. 7 > 72076 Tübingen/GERMANY > > Tel.: +49 (0) 7071 29 83210 > Fax. + 49 (0) 7071 29 5228 > > Confidentiality Note:\ This message is intended only for...{{dropped:9}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi all, Has anyone tried doing analysis on log-ratios of the red and green channels rather than beta? I would imagine they would suit existing tools such as limma and the results can always be converted back to betas afterwards. Just a thought. Mark On Thu, 2008-11-20 at 10:38 -0500, Sean Davis wrote: > On Thu, Nov 20, 2008 at 8:57 AM, Michael Walter < > michael.walter at med.uni-tuebingen.de> wrote: > > > Dear Sean, > > > > Thanks for your answers. I have one array probed with fully methylated DNA > > purchased by ZYMO. Here all beta values should be 1, which they aren't, of > > course. Can I use these values to normalize the rest of my arrays? Let > > assume my fully methylated value is 0.7 and my actual value is 0.5 then I > > would correct my beta to 0.5/0.7? > > > > Hi, Mike. That might work, but the approach we have taken is to correct the > intensities which show a significant dye bias, so a significant bias in > beta. > > Sean > > > > > > > > > > -----Urspr?ngliche Nachricht----- > > > Von: "Sean Davis" <sdavis2 at="" mail.nih.gov=""> > > > Gesendet: 19.11.08 16:30:15 > > > An: "Michael Walter" <michael.walter at="" med.uni-tuebingen.de=""> > > > CC: bioconductor at stat.math.ethz.ch > > > Betreff: Re: [BioC] Illumina Methylation. Normalization and statistics > > > > > > > On Wed, Nov 19, 2008 at 10:18 AM, Michael Walter < > > > michael.walter at med.uni-tuebingen.de> wrote: > > > > > > > Dear List, > > > > > > > > We run our first slide of illumina's infinium methylation arrays. After > > > > searching the archive, I still have some general questions how to best > > > > analyze the data. > > > > > > > > First of all, I'm would like to know some opinion on normalization. In > > my > > > > personal and probably simplistic view I'd think that normalization is > > not > > > > necessary since the value you get from the array is a ratio which is > > sample > > > > inherent (unlike a classical two-color expression array where you mix > > two > > > > samples to generate the expression ratio). Is this assumption correct > > or am > > > > I missing some important aspect? > > > > > > > > > > Unfortunately, there is a significant dye-bias issue. That is, there is > > a > > > propensity for one dye to be brighter than the other and it appears that > > > Illumina does not adequately correct for this bias. > > > > > > > > > > > > > > Anyway, I'd like to perform background normalization which results as > > usual > > > > with illumina arrays in some negative values. Does anyone one have a > > neat > > > > solution for this problem or shall I just skip the probes? > > > > > > > > > > I have been just ignoring those probes. > > > > > > > > > > > > > > Do I have to correct for some dye effect like for the golden gate > > > > methylation assay? Since the probes for methylated and unmethylated DNA > > > > incorporate the same dye this shouldn't be an issue? > > > > > > > > > > See above. > > > > > > > > > > > > > > My final question is basically the most pressing: What kind of > > statistic > > > > test should I use? Since all the values are ratios between 0 and 1 I > > have a > > > > real bad feeling by simply running some t-tests. And if a t-test is the > > > > proper choice, shall I log-transform the data? > > > > > > > > > > The t-statistic should still be valid, I think. The assumptions that go > > > into statistics like the t-stat are not based on the distribution of the > > > data, but on differences between values. I think these assumption > > probably > > > still holds in practice for these data. However, I have not tried to > > prove > > > things one way or the other. Of course, if you are concerned about, > > > non-parametric testing will alleviate these concerns. > > > > > > Sean > > > > > > > > > > > > > > Any input and shared experience with this type of array is highly > > > > appreciated. > > > > > > > > > > > > Best Regards, > > > > > > > > > > > > Mike > > > > -- > > > > Dr. Michael Walter > > > > > > > > The Microarray Facility > > > > University of Tuebingen > > > > Calwerstr. 7 > > > > 72076 T??bingen/GERMANY > > > > > > > > Tel.: +49 (0) 7071 29 83210 > > > > Fax. + 49 (0) 7071 29 5228 > > > > > > > > Confidentiality Note:\ This message is intended only > > for...{{dropped:9}} > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor at stat.math.ethz.ch > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Search the archives: > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > [[alternative HTML version deleted]] > > > > > > > > >
> > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > -- > > Dr. Michael Walter > > > > The Microarray Facility > > University of Tuebingen > > Calwerstr. 7 > > 72076 T?bingen/GERMANY > > > > Tel.: +49 (0) 7071 29 83210 > > Fax. + 49 (0) 7071 29 5228 > > > > Confidentiality Note:\ This message is intended only for...{{dropped:9}} > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY

Login before adding your answer.

Traffic: 521 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6