Analysing DNA methylation microarrays in Bioconductor
1
0
Entering edit mode
Paul Geeleher ★ 1.3k
@paul-geeleher-2679
Last seen 9.6 years ago
Hello List, I've inherited microarray data from a bunch of Agilent CpG island methylation arrays, 5 control and 5 disease samples. Basically the arrays are set up as follows: Cy3 - Methylated DNA from a patient's bowel isolated using IP. Cy5 - DNA (Both methylated and unmethylated) from the bowel of the same person as above. Basically I'd like to identify regions of the genome (the individual reporters or even better the CpG islands [I think there averages about 8 reporters per CpG island]) that are differentially methylated between the 5 disease and 5 control samples. So I'm wondering what packages (if any) in Bioconductor I could be looking at to do this? I'd also welcome the suggestion of any other software that might be out there? Thanks, Paul -- Paul Geeleher School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com
Microarray Microarray • 1.9k views
ADD COMMENT
0
Entering edit mode
@claus-jurgen-scholz-3117
Last seen 9.6 years ago
Hi Paul, take a look whether the packages rMAT, BAC and iChip fit your needs. Also, packages Ringo and Starr provide functionality for ChIP-on-chip (or MeDIP in your case) analysis, but more focussed on Nimblegen and Affymetrix arrays, respectively. Bests, Claus-J?rgen Am 21.07.2010 19:04, schrieb Paul Geeleher: > Hello List, > > I've inherited microarray data from a bunch of Agilent CpG island > methylation arrays, 5 control and 5 disease samples. Basically the > arrays are set up as follows: > > Cy3 - Methylated DNA from a patient's bowel isolated using IP. > Cy5 - DNA (Both methylated and unmethylated) from the bowel of the > same person as above. > > Basically I'd like to identify regions of the genome (the individual > reporters or even better the CpG islands [I think there averages about > 8 reporters per CpG island]) that are differentially methylated > between the 5 disease and 5 control samples. > > So I'm wondering what packages (if any) in Bioconductor I could be > looking at to do this? I'd also welcome the suggestion of any other > software that might be out there? > > Thanks, > > Paul > > >
ADD COMMENT
0
Entering edit mode
Thanks for your reply Claus, What I've noticed however about these and every other tool I've found is that they seem to be able to characterize a methlyation pattern in a sample. I.e. say "this region appears to be methylated in this sample". What I'd like is something that can compare the methylation levels between the samples, basically outputting a probability that a region/reporter is methylated in one phenotype and unmethylated in the other. It would be great if anyone could point me towards such a tool, or confirm that it doesn't actually exist? Thanks, Paul 2010/7/22 Claus-J?rgen Scholz <scholz at="" klin-biochem.uni-="" wuerzburg.de="">: > Hi Paul, > > take a look whether the packages rMAT, BAC and iChip fit your needs. > Also, packages Ringo and Starr provide functionality for ChIP-on- chip > (or MeDIP in your case) analysis, but more focussed on Nimblegen and > Affymetrix arrays, respectively. > > Bests, > Claus-J?rgen > > > Am 21.07.2010 19:04, schrieb Paul Geeleher: >> Hello List, >> >> I've inherited microarray data from a bunch of Agilent CpG island >> methylation arrays, 5 control and 5 disease samples. Basically the >> arrays are set up as follows: >> >> Cy3 - Methylated DNA from a patient's bowel isolated using IP. >> Cy5 - DNA (Both methylated and unmethylated) from the bowel of the >> same person as above. >> >> Basically I'd like to identify regions of the genome (the individual >> reporters or even better the CpG islands [I think there averages about >> 8 reporters per CpG island]) that are differentially methylated >> between the 5 disease and 5 control samples. >> >> So I'm wondering what packages (if any) in Bioconductor I could be >> looking at to do this? I'd also welcome the suggestion of any other >> software that might be out there? >> >> Thanks, >> >> Paul >> >> >> > -- Paul Geeleher School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com
ADD REPLY
0
Entering edit mode
Hi, On Fri, Jul 23, 2010 at 1:35 PM, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: > Thanks for your reply Claus, > > What I've noticed however about these and every other tool I've found > is that they seem to be able to characterize a methlyation pattern in > a sample. I.e. say "this region appears to be methylated in this > sample". > > What I'd like is something that can compare the methylation levels > between the samples, basically outputting a probability that a > region/reporter is methylated in one phenotype and unmethylated in the > other. It would be great if anyone could point me towards such a tool, > or confirm that it doesn't actually exist? Well, I guess it's impossible to say that something *doesn't* exist (cf. the black swan), but if you have tools that tell you "this region is methylated" in a given sample, can't you do this yourself? Say you use all of your replicate experiments to get a "golden answer" for regions methylated in disease. and regions methylated in "normals". I could imagine storing such info in an IRanges object (or IRangesList (one IRanges object for each chromosome), then just doing a setdiff(disease, normal) to see which ranges are methylated in disease and not normal. Isn't that a start? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY
0
Entering edit mode
I understand your approach but the main problem I'd see with such a thresholding approach is that you are highly likely to find regions that are just below the cutoff to be called "methylated" in one phenotype and just above the threshold in the other phenotype. Thus most likely not differentially methylated at all. Do you see what I mean? Perhaps some kind of approach that labels each reporter as having a probability of methylation (and hence a probability of unmethylation), which can be compared across samples of a given phenotype to give a probability of all reporters being methylated/unmethylated in each phenotype, then compares these probabilities between phenotypes to give a probability of "differential methylation". That's just off the top of my head, I think it makes sense, but I'm presuming nothing like that has actually been implemented? Paul. On Fri, Jul 23, 2010 at 6:45 PM, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: > Hi, > > On Fri, Jul 23, 2010 at 1:35 PM, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: >> Thanks for your reply Claus, >> >> What I've noticed however about these and every other tool I've found >> is that they seem to be able to characterize a methlyation pattern in >> a sample. I.e. say "this region appears to be methylated in this >> sample". >> >> What I'd like is something that can compare the methylation levels >> between the samples, basically outputting a probability that a >> region/reporter is methylated in one phenotype and unmethylated in the >> other. It would be great if anyone could point me towards such a tool, >> or confirm that it doesn't actually exist? > > Well, I guess it's impossible to say that something *doesn't* exist > (cf. the black swan), but if you have tools that tell you "this region > is methylated" in a given sample, can't you do this yourself? > > Say you use all of your replicate experiments to get a "golden answer" > for regions methylated in disease. and regions methylated in > "normals". > > I could imagine storing such info in an IRanges object (or IRangesList > (one IRanges object for each chromosome), then just doing a > setdiff(disease, normal) to see which ranges are methylated in disease > and not normal. > > Isn't that a start? > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > ?| Memorial Sloan-Kettering Cancer Center > ?| Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > -- Paul Geeleher School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com
ADD REPLY
0
Entering edit mode
On Jul 23, 2010 11:54 AM, "Paul Geeleher" <paulgeeleher@gmail.com> wrote: I understand your approach but the main problem I'd see with such a thresholding approach is that you are highly likely to find regions that are just below the cutoff to be called "methylated" in one phenotype and just above the threshold in the other phenotype. Thus most likely not differentially methylated at all. Do you see what I mean? Perhaps some kind of approach that labels each reporter as having a probability of methylation (and hence a probability of unmethylation), which can be compared across samples of a given phenotype to give a probability of all reporters being methylated/unmethylated in each phenotype, then compares these probabilities between phenotypes to give a probability of "differential methylation". That's just off the top of my head, I think it makes sense, but I'm presuming nothing like that has actually been implemented? Paul. On Fri, Jul 23, 2010 at 6:45 PM, Steve Lianoglou <mailinglist.honeypot@gmail.com> wrote: > Hi, > > ... -- Paul Geeleher School of Mathematics, Statistics and Applied Mathematics National University of I... Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioco... [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi, Paul. How many samples do you have? And what are the sizes of the groups? It seems to me that you have for each probe a number. You could do probewise testing between groups, or you could do some summarization first and then hypothesis testing. In any case, there are a number of ways to arrive at an n x p matrix where standard statistical tools could be used. Sean On Jul 23, 2010 11:54 AM, "Paul Geeleher" <paulgeeleher@gmail.com> wrote: I understand your approach but the main problem I'd see with such a thresholding approach is that you are highly likely to find regions that are just below the cutoff to be called "methylated" in one phenotype and just above the threshold in the other phenotype. Thus most likely not differentially methylated at all. Do you see what I mean? Perhaps some kind of approach that labels each reporter as having a probability of methylation (and hence a probability of unmethylation), which can be compared across samples of a given phenotype to give a probability of all reporters being methylated/unmethylated in each phenotype, then compares these probabilities between phenotypes to give a probability of "differential methylation". That's just off the top of my head, I think it makes sense, but I'm presuming nothing like that has actually been implemented? Paul. On Fri, Jul 23, 2010 at 6:45 PM, Steve Lianoglou <mailinglist.honeypot@gmail.com> wrote: > Hi, > > ... -- Paul Geeleher School of Mathematics, Statistics and Applied Mathematics National University of I... Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioco... [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Thanks for the replies guys, Sean, we have 5 disease samples and 5 control samples. Each array has 244k reporters located in CpG islands, averaging about 8 reporters per CpG island. Jinyan, doesn't MEDME require some kind of calibration experiment? Needless to say this hasn't been done and it's unlikely that there is money there to do it. Paul. On Fri, Jul 23, 2010 at 7:02 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > Hi, Paul.? How many samples do you have?? And what are the sizes of the > groups? > > It seems to me that you have for each probe a number.? You could do > probewise testing between groups, or you could do some summarization first > and then hypothesis testing.? In any case, there are a number of ways to > arrive at an n x p matrix where standard statistical tools could be used. > > Sean > > On Jul 23, 2010 11:54 AM, "Paul Geeleher" <paulgeeleher at="" gmail.com=""> wrote: > > I understand your approach but the main problem I'd see with such a > thresholding approach is that you are highly likely to find regions > that are just below the cutoff to be called "methylated" in one > phenotype and just above the threshold in the other phenotype. Thus > most likely not differentially methylated at all. Do you see what I > mean? > > Perhaps some kind of approach that labels each reporter as having a > probability of methylation (and hence a probability of unmethylation), > which can be compared across samples of a given phenotype to give a > probability of all reporters being methylated/unmethylated in each > phenotype, then compares these probabilities between phenotypes to > give a probability of "differential methylation". That's just off the > top of my head, I think it makes sense, but I'm presuming nothing like > that has actually been implemented? > > Paul. > > On Fri, Jul 23, 2010 at 6:45 PM, Steve Lianoglou > <mailinglist.honeypot at="" gmail.com=""> wrote: >> Hi, >> >> ... > > -- > Paul Geeleher > School of Mathematics, Statistics and Applied Mathematics > National University of I... > > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioco... -- Paul Geeleher School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com
ADD REPLY
0
Entering edit mode
On Fri, Jul 23, 2010 at 12:51 PM, Paul Geeleher <paulgeeleher@gmail.com>wrote: > Thanks for the replies guys, > > Sean, we have 5 disease samples and 5 control samples. Each array has > 244k reporters located in CpG islands, averaging about 8 reporters per > CpG island. > > So, why not generate a 10 x 244k matrix or 10 x 30k matrix if you summarize over CpG island and then apply a hypothesis test of your choice (which might need to be nonparametric, even) to the data? The value associated with each probe per sample could be either a raw value (after "appropriate normalization") or it could be derived from a number of ChIP-chip like analysis packages (ACME, tilingarray, etc.). Sean > Jinyan, doesn't MEDME require some kind of calibration experiment? > Needless to say this hasn't been done and it's unlikely that there is > money there to do it. > > Paul. > > On Fri, Jul 23, 2010 at 7:02 PM, Sean Davis <sdavis2@mail.nih.gov> wrote: > > Hi, Paul. How many samples do you have? And what are the sizes of the > > groups? > > > > It seems to me that you have for each probe a number. You could do > > probewise testing between groups, or you could do some summarization > first > > and then hypothesis testing. In any case, there are a number of ways to > > arrive at an n x p matrix where standard statistical tools could be used. > > > > Sean > > > > On Jul 23, 2010 11:54 AM, "Paul Geeleher" <paulgeeleher@gmail.com> > wrote: > > > > I understand your approach but the main problem I'd see with such a > > thresholding approach is that you are highly likely to find regions > > that are just below the cutoff to be called "methylated" in one > > phenotype and just above the threshold in the other phenotype. Thus > > most likely not differentially methylated at all. Do you see what I > > mean? > > > > Perhaps some kind of approach that labels each reporter as having a > > probability of methylation (and hence a probability of unmethylation), > > which can be compared across samples of a given phenotype to give a > > probability of all reporters being methylated/unmethylated in each > > phenotype, then compares these probabilities between phenotypes to > > give a probability of "differential methylation". That's just off the > > top of my head, I think it makes sense, but I'm presuming nothing like > > that has actually been implemented? > > > > Paul. > > > > On Fri, Jul 23, 2010 at 6:45 PM, Steve Lianoglou > > <mailinglist.honeypot@gmail.com> wrote: > >> Hi, > >> > >> ... > > > > -- > > Paul Geeleher > > School of Mathematics, Statistics and Applied Mathematics > > National University of I... > > > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioco... > > > > -- > Paul Geeleher > School of Mathematics, Statistics and Applied Mathematics > National University of Ireland > Galway > Ireland > -- > www.bioinformaticstutorials.com > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Interesting. I'm not sure it'd make sense to use expression values (log ratios I assume) because while there might be a statistically significant difference between the expression levels in each of the phenotypes, that doesn't necessarily imply that the reporters are methylated in one phenotype and unmethylated in the other if you see what I mean? I'm assuming in the second case you are refering to a p-value for to the probability of methylation of each reporter. Maybe this makes more sense, but I think you still need one phenotype to have high probabilty of methylation and the other phenotype to have high probability of unmethylation, along with a statistically significant difference in the p-values between the phenotypes? Paul. On Fri, Jul 23, 2010 at 8:02 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > > > On Fri, Jul 23, 2010 at 12:51 PM, Paul Geeleher <paulgeeleher at="" gmail.com=""> > wrote: >> >> Thanks for the replies guys, >> >> Sean, we have 5 disease samples and 5 control samples. Each array has >> 244k reporters located in CpG islands, averaging about 8 reporters per >> CpG island. >> > > So, why not generate a 10 x 244k matrix or 10 x 30k matrix if you summarize > over CpG island and then apply a hypothesis test of your choice (which might > need to be nonparametric, even) to the data? ?The value associated with each > probe per sample could be either a raw value (after "appropriate > normalization") or it could be derived from a number of ChIP-chip like > analysis packages (ACME, tilingarray, etc.). > Sean > >> >> Jinyan, doesn't MEDME require some kind of calibration experiment? >> Needless to say this hasn't been done and it's unlikely that there is >> money there to do it. >> >> Paul. >> >> On Fri, Jul 23, 2010 at 7:02 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: >> > Hi, Paul.? How many samples do you have?? And what are the sizes of the >> > groups? >> > >> > It seems to me that you have for each probe a number.? You could do >> > probewise testing between groups, or you could do some summarization >> > first >> > and then hypothesis testing.? In any case, there are a number of ways to >> > arrive at an n x p matrix where standard statistical tools could be >> > used. >> > >> > Sean >> > >> > On Jul 23, 2010 11:54 AM, "Paul Geeleher" <paulgeeleher at="" gmail.com=""> >> > wrote: >> > >> > I understand your approach but the main problem I'd see with such a >> > thresholding approach is that you are highly likely to find regions >> > that are just below the cutoff to be called "methylated" in one >> > phenotype and just above the threshold in the other phenotype. Thus >> > most likely not differentially methylated at all. Do you see what I >> > mean? >> > >> > Perhaps some kind of approach that labels each reporter as having a >> > probability of methylation (and hence a probability of unmethylation), >> > which can be compared across samples of a given phenotype to give a >> > probability of all reporters being methylated/unmethylated in each >> > phenotype, then compares these probabilities between phenotypes to >> > give a probability of "differential methylation". That's just off the >> > top of my head, I think it makes sense, but I'm presuming nothing like >> > that has actually been implemented? >> > >> > Paul. >> > >> > On Fri, Jul 23, 2010 at 6:45 PM, Steve Lianoglou >> > <mailinglist.honeypot at="" gmail.com=""> wrote: >> >> Hi, >> >> >> >> ... >> > >> > -- >> > Paul Geeleher >> > School of Mathematics, Statistics and Applied Mathematics >> > National University of I... >> > >> > Bioconductor mailing list >> > Bioconductor at stat.math.ethz.ch >> > https://stat.ethz.ch/mailman/listinfo/bioco... >> >> >> >> -- >> Paul Geeleher >> School of Mathematics, Statistics and Applied Mathematics >> National University of Ireland >> Galway >> Ireland >> -- >> www.bioinformaticstutorials.com >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Paul Geeleher School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com
ADD REPLY
0
Entering edit mode
Hi, Paul. Thinking of methylation as a "black or white" affair might make sense for an individual cell or, perhaps, a perfectly homogeneous pool of cells (which probably does not exist), but from a tissue, I'm not sure that it is possible to think of methylation measurements that way. What you are measuring is the aggregation of methylation profiles associated with potentially different methylation states in the tissue pool; this could certainly result in a fully continuous measure of methylation. Therefore, finding statistical differences is still probably a useful way to think of the problem (though not the only one, obviously). Just like for gene expression, a statistically significantly result does not imply a biologically important result, so you may want to stipulate a further filter that the difference between your two groups pass some arbitrary threshold. Sean On Fri, Jul 23, 2010 at 1:16 PM, Paul Geeleher <paulgeeleher@gmail.com>wrote: > Interesting. I'm not sure it'd make sense to use expression values > (log ratios I assume) because while there might be a statistically > significant difference between the expression levels in each of the > phenotypes, that doesn't necessarily imply that the reporters are > methylated in one phenotype and unmethylated in the other if you see > what I mean? > > I'm assuming in the second case you are refering to a p-value for to > the probability of methylation of each reporter. Maybe this makes more > sense, but I think you still need one phenotype to have high > probabilty of methylation and the other phenotype to have high > probability of unmethylation, along with a statistically significant > difference in the p-values between the phenotypes? > > Paul. > > On Fri, Jul 23, 2010 at 8:02 PM, Sean Davis <sdavis2@mail.nih.gov> wrote: > > > > > > On Fri, Jul 23, 2010 at 12:51 PM, Paul Geeleher <paulgeeleher@gmail.com> > > wrote: > >> > >> Thanks for the replies guys, > >> > >> Sean, we have 5 disease samples and 5 control samples. Each array has > >> 244k reporters located in CpG islands, averaging about 8 reporters per > >> CpG island. > >> > > > > So, why not generate a 10 x 244k matrix or 10 x 30k matrix if you > summarize > > over CpG island and then apply a hypothesis test of your choice (which > might > > need to be nonparametric, even) to the data? The value associated with > each > > probe per sample could be either a raw value (after "appropriate > > normalization") or it could be derived from a number of ChIP-chip like > > analysis packages (ACME, tilingarray, etc.). > > Sean > > > >> > >> Jinyan, doesn't MEDME require some kind of calibration experiment? > >> Needless to say this hasn't been done and it's unlikely that there is > >> money there to do it. > >> > >> Paul. > >> > >> On Fri, Jul 23, 2010 at 7:02 PM, Sean Davis <sdavis2@mail.nih.gov> > wrote: > >> > Hi, Paul. How many samples do you have? And what are the sizes of > the > >> > groups? > >> > > >> > It seems to me that you have for each probe a number. You could do > >> > probewise testing between groups, or you could do some summarization > >> > first > >> > and then hypothesis testing. In any case, there are a number of ways > to > >> > arrive at an n x p matrix where standard statistical tools could be > >> > used. > >> > > >> > Sean > >> > > >> > On Jul 23, 2010 11:54 AM, "Paul Geeleher" <paulgeeleher@gmail.com> > >> > wrote: > >> > > >> > I understand your approach but the main problem I'd see with such a > >> > thresholding approach is that you are highly likely to find regions > >> > that are just below the cutoff to be called "methylated" in one > >> > phenotype and just above the threshold in the other phenotype. Thus > >> > most likely not differentially methylated at all. Do you see what I > >> > mean? > >> > > >> > Perhaps some kind of approach that labels each reporter as having a > >> > probability of methylation (and hence a probability of unmethylation), > >> > which can be compared across samples of a given phenotype to give a > >> > probability of all reporters being methylated/unmethylated in each > >> > phenotype, then compares these probabilities between phenotypes to > >> > give a probability of "differential methylation". That's just off the > >> > top of my head, I think it makes sense, but I'm presuming nothing like > >> > that has actually been implemented? > >> > > >> > Paul. > >> > > >> > On Fri, Jul 23, 2010 at 6:45 PM, Steve Lianoglou > >> > <mailinglist.honeypot@gmail.com> wrote: > >> >> Hi, > >> >> > >> >> ... > >> > > >> > -- > >> > Paul Geeleher > >> > School of Mathematics, Statistics and Applied Mathematics > >> > National University of I... > >> > > >> > Bioconductor mailing list > >> > Bioconductor@stat.math.ethz.ch > >> > https://stat.ethz.ch/mailman/listinfo/bioco... > >> > >> > >> > >> -- > >> Paul Geeleher > >> School of Mathematics, Statistics and Applied Mathematics > >> National University of Ireland > >> Galway > >> Ireland > >> -- > >> www.bioinformaticstutorials.com > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > -- > Paul Geeleher > School of Mathematics, Statistics and Applied Mathematics > National University of Ireland > Galway > Ireland > -- > www.bioinformaticstutorials.com > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
I see exactly what you mean. The fact that these are CpG island arrays should also hopefully mean that adjacent reporters will be showing methylation/unmethylation in the same direction which should help to lock down the important genomic regions. Thanks for the food for thought! Paul. On Fri, Jul 23, 2010 at 8:24 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: > Hi, Paul. > Thinking of methylation as a "black or white" affair might make sense for an > individual cell or, perhaps, a perfectly homogeneous pool of cells (which > probably does not exist), but from a tissue, I'm not sure that it is > possible to think of methylation measurements that way. ?What you are > measuring is the aggregation of methylation profiles associated with > potentially different methylation states in the tissue pool; this could > certainly result in a fully continuous measure of methylation. ?Therefore, > finding statistical differences is still probably a useful way to think of > the problem (though not the only one, obviously). ?Just like for gene > expression, a statistically significantly result does not imply a > biologically important result, so you may want to stipulate a further filter > that the difference between your two groups pass some arbitrary threshold. > Sean > > On Fri, Jul 23, 2010 at 1:16 PM, Paul Geeleher <paulgeeleher at="" gmail.com=""> > wrote: >> >> Interesting. I'm not sure it'd make sense to use expression values >> (log ratios I assume) because while there might be a statistically >> significant difference between the expression levels in each of the >> phenotypes, that doesn't necessarily imply that the reporters are >> methylated in one phenotype and unmethylated in the other if you see >> what I mean? >> >> I'm assuming in the second case you are refering to a p-value for to >> the probability of methylation of each reporter. Maybe this makes more >> sense, but I think you still need one phenotype to have high >> probabilty of methylation and the other phenotype to have high >> probability of unmethylation, along with a statistically significant >> difference in the p-values between the phenotypes? >> >> Paul. >> >> On Fri, Jul 23, 2010 at 8:02 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> wrote: >> > >> > >> > On Fri, Jul 23, 2010 at 12:51 PM, Paul Geeleher <paulgeeleher at="" gmail.com=""> >> > wrote: >> >> >> >> Thanks for the replies guys, >> >> >> >> Sean, we have 5 disease samples and 5 control samples. Each array has >> >> 244k reporters located in CpG islands, averaging about 8 reporters per >> >> CpG island. >> >> >> > >> > So, why not generate a 10 x 244k matrix or 10 x 30k matrix if you >> > summarize >> > over CpG island and then apply a hypothesis test of your choice (which >> > might >> > need to be nonparametric, even) to the data? ?The value associated with >> > each >> > probe per sample could be either a raw value (after "appropriate >> > normalization") or it could be derived from a number of ChIP-chip like >> > analysis packages (ACME, tilingarray, etc.). >> > Sean >> > >> >> >> >> Jinyan, doesn't MEDME require some kind of calibration experiment? >> >> Needless to say this hasn't been done and it's unlikely that there is >> >> money there to do it. >> >> >> >> Paul. >> >> >> >> On Fri, Jul 23, 2010 at 7:02 PM, Sean Davis <sdavis2 at="" mail.nih.gov=""> >> >> wrote: >> >> > Hi, Paul.? How many samples do you have?? And what are the sizes of >> >> > the >> >> > groups? >> >> > >> >> > It seems to me that you have for each probe a number.? You could do >> >> > probewise testing between groups, or you could do some summarization >> >> > first >> >> > and then hypothesis testing.? In any case, there are a number of ways >> >> > to >> >> > arrive at an n x p matrix where standard statistical tools could be >> >> > used. >> >> > >> >> > Sean >> >> > >> >> > On Jul 23, 2010 11:54 AM, "Paul Geeleher" <paulgeeleher at="" gmail.com=""> >> >> > wrote: >> >> > >> >> > I understand your approach but the main problem I'd see with such a >> >> > thresholding approach is that you are highly likely to find regions >> >> > that are just below the cutoff to be called "methylated" in one >> >> > phenotype and just above the threshold in the other phenotype. Thus >> >> > most likely not differentially methylated at all. Do you see what I >> >> > mean? >> >> > >> >> > Perhaps some kind of approach that labels each reporter as having a >> >> > probability of methylation (and hence a probability of >> >> > unmethylation), >> >> > which can be compared across samples of a given phenotype to give a >> >> > probability of all reporters being methylated/unmethylated in each >> >> > phenotype, then compares these probabilities between phenotypes to >> >> > give a probability of "differential methylation". That's just off the >> >> > top of my head, I think it makes sense, but I'm presuming nothing >> >> > like >> >> > that has actually been implemented? >> >> > >> >> > Paul. >> >> > >> >> > On Fri, Jul 23, 2010 at 6:45 PM, Steve Lianoglou >> >> > <mailinglist.honeypot at="" gmail.com=""> wrote: >> >> >> Hi, >> >> >> >> >> >> ... >> >> > >> >> > -- >> >> > Paul Geeleher >> >> > School of Mathematics, Statistics and Applied Mathematics >> >> > National University of I... >> >> > >> >> > Bioconductor mailing list >> >> > Bioconductor at stat.math.ethz.ch >> >> > https://stat.ethz.ch/mailman/listinfo/bioco... >> >> >> >> >> >> >> >> -- >> >> Paul Geeleher >> >> School of Mathematics, Statistics and Applied Mathematics >> >> National University of Ireland >> >> Galway >> >> Ireland >> >> -- >> >> www.bioinformaticstutorials.com >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor at stat.math.ethz.ch >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >> > >> >> >> >> -- >> Paul Geeleher >> School of Mathematics, Statistics and Applied Mathematics >> National University of Ireland >> Galway >> Ireland >> -- >> www.bioinformaticstutorials.com >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Paul Geeleher School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com
ADD REPLY
0
Entering edit mode
Why not try MEDME?
ADD REPLY
0
Entering edit mode
Of course, defining a 3rd state, "methylated", "unmethylated" and "unsure" and only calling a reporter differentially methylated if they are all methylated in one phenotype and unmethylated in the other might also work. Cut-offs could be fairly arbitrary though. Paul. On Fri, Jul 23, 2010 at 6:54 PM, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: > I understand your approach but the main problem I'd see with such a > thresholding approach is that you are highly likely to find regions > that are just below the cutoff to be called "methylated" in one > phenotype and just above the threshold in the other phenotype. Thus > most likely not differentially methylated at all. Do you see what I > mean? > > Perhaps some kind of approach that labels each reporter as having a > probability of methylation (and hence a probability of unmethylation), > which can be compared across samples of a given phenotype to give a > probability of all reporters being methylated/unmethylated in each > phenotype, then compares these probabilities between phenotypes to > give a probability of "differential methylation". That's just off the > top of my head, I think it makes sense, but I'm presuming nothing like > that has actually been implemented? > > Paul. > > On Fri, Jul 23, 2010 at 6:45 PM, Steve Lianoglou > <mailinglist.honeypot at="" gmail.com=""> wrote: >> Hi, >> >> On Fri, Jul 23, 2010 at 1:35 PM, Paul Geeleher <paulgeeleher at="" gmail.com=""> wrote: >>> Thanks for your reply Claus, >>> >>> What I've noticed however about these and every other tool I've found >>> is that they seem to be able to characterize a methlyation pattern in >>> a sample. I.e. say "this region appears to be methylated in this >>> sample". >>> >>> What I'd like is something that can compare the methylation levels >>> between the samples, basically outputting a probability that a >>> region/reporter is methylated in one phenotype and unmethylated in the >>> other. It would be great if anyone could point me towards such a tool, >>> or confirm that it doesn't actually exist? >> >> Well, I guess it's impossible to say that something *doesn't* exist >> (cf. the black swan), but if you have tools that tell you "this region >> is methylated" in a given sample, can't you do this yourself? >> >> Say you use all of your replicate experiments to get a "golden answer" >> for regions methylated in disease. and regions methylated in >> "normals". >> >> I could imagine storing such info in an IRanges object (or IRangesList >> (one IRanges object for each chromosome), then just doing a >> setdiff(disease, normal) to see which ranges are methylated in disease >> and not normal. >> >> Isn't that a start? >> >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> ?| Memorial Sloan-Kettering Cancer Center >> ?| Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact >> > > > > -- > Paul Geeleher > School of Mathematics, Statistics and Applied Mathematics > National University of Ireland > Galway > Ireland > -- > www.bioinformaticstutorials.com > -- Paul Geeleher School of Mathematics, Statistics and Applied Mathematics National University of Ireland Galway Ireland -- www.bioinformaticstutorials.com
ADD REPLY

Login before adding your answer.

Traffic: 772 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6