Testing for no difference

0

Entering edit mode

Gustavo Fernández Bayón ▴ 440

@gustavo-fernandez-bayon-5300

Last seen 9.9 years ago

Spain

Hi everybody. I have a set of only 5 samples of Illumina27k methylation data. We have extracted some info from the probes, but now the researcher in charge of the project wants something that could support his idea of the five samples to be practically equivalent wrt to their methylation levels. I know that the sample is quite small. Intuitively, if you plot densities from the 5 samples, they are almost equal. Problem is, I do not know a way in which I could give a statistical significance about this fact (yes, as always, there is the "I need a p-value" problem). 1) I did PCA on both beta values and m-values, and found that the first principal component accounts for between 90 and 91% of the total variance. In the biplot, the five samples appear to be very close. 2) I asked for advice to a statistician friend, and we tried to do the following: probe by probe, we tried a Leave-One-Out approach, by calculating a confidence interval for 4 of the samples and seeing if the remaining probe falls within the interval. Then, for each probe, I summed the number of times a methylation value fell out of the confInt, only to find out that nearly 53% of the probes contain -in this sense- 'outliers'. At first it surprised me, but then I noticed -by plotting the outliers against the samples- that these 'outliers' were uniformly distributed among samples, which I think is again intuitive, i.e., there are indeed differences (statistical differences, maybe not biological) among samples, but there is no global difference of one of the samples w.r.t. the others. These differences might be related to technical noise, so I was thinking of imposing a minimum difference in order to test again for outliers. Would this be ok? Is there any method I can use to try to show there is no difference among the samples? Or should I stay with the graphs and the intuitive description on the text? Thanks. Any help or hint would be much appreciated. Regards, Gustavo --------------------------- Enviado con Sparrow (http://www.sparrowmailapp.com/?sig)

probe probe • 2.0k views

ADD COMMENT • link updated 13.4 years ago by Albyn Jones ▴ 70 • written 13.4 years ago by Gustavo Fernández Bayón ▴ 440

0

Entering edit mode

Albyn Jones ▴ 70

@albyn-jones-3850

Last seen 11.3 years ago

You might look into "tests of equivalence", one common procedure involves defining an interval (-a,a) and doing two one sided tests ("TOST") for H_0: delta > a and delta < -a, which is equivalent to checking that the CI for the difference is contained in the specified interval. albyn On 7/23/12 12:52 AM, Gustavo Fern?ndez Bay?n wrote: > Hi everybody. > > I have a set of only 5 samples of Illumina27k methylation data. We > have extracted some info from the probes, but now the researcher in > charge of the project wants something that could support his idea of > the five samples to be practically equivalent wrt to their > methylation > levels. > > I know that the sample is quite small. Intuitively, if you plot > densities from the 5 samples, they are almost equal. Problem is, I do > not know a way in which I could give a statistical significance about > this fact (yes, as always, there is the "I need a p-value" problem). > > 1) I did PCA on both beta values and m-values, and found that the > first principal component accounts for between 90 and 91% of the > total > variance. In the biplot, the five samples appear to be very close. > > 2) I asked for advice to a statistician friend, and we tried to do > the following: probe by probe, we tried a Leave-One-Out approach, by > calculating a confidence interval for 4 of the samples and seeing if > the remaining probe falls within the interval. Then, for each probe, > I > summed the number of times a methylation value fell out of the > confInt, only to find out that nearly 53% of the probes contain -in > this sense- 'outliers'. > > At first it surprised me, but then I noticed -by plotting the > outliers against the samples- that these 'outliers' were uniformly > distributed among samples, which I think is again intuitive, i.e., > there are indeed differences (statistical differences, maybe not > biological) among samples, but there is no global difference of one > of > the samples w.r.t. the others. > > These differences might be related to technical noise, so I was > thinking of imposing a minimum difference in order to test again for > outliers. Would this be ok? > > Is there any method I can use to try to show there is no difference > among the samples? Or should I stay with the graphs and the intuitive > description on the text? > > Thanks. Any help or hint would be much appreciated. > > Regards, > Gustavo > > --------------------------- > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 13.4 years ago Albyn Jones ▴ 70

0

Entering edit mode

Hi Albyn. As I have already answered to W. Huber in this thread, I think that TOST could be a good choice here. I think I am going to give it a try. Problem is, I am comfortable with the two sample description of TOST but, what about multiple groups? Should I do a similar procedure to the one I was doing before? That is, leave-one-out methods for each of the probes, but doing TOST's instead of common tests? Thanks for your answer. Regards, Gus --------------------------- Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) El lunes 23 de julio de 2012 a las 17:29, jones escribi?: > You might look into "tests of equivalence", one common procedure > involves defining an interval (-a,a) and doing two one sided > tests ("TOST") for H_0: delta > a and delta < -a, which is equivalent > to checking that the CI for the difference is contained in > the specified interval. > > albyn > > On 7/23/12 12:52 AM, Gustavo Fern?ndez Bay?n wrote: > > Hi everybody. > > > > I have a set of only 5 samples of Illumina27k methylation data. We > > have extracted some info from the probes, but now the researcher in > > charge of the project wants something that could support his idea of > > the five samples to be practically equivalent wrt to their > > methylation > > levels. > > > > I know that the sample is quite small. Intuitively, if you plot > > densities from the 5 samples, they are almost equal. Problem is, I do > > not know a way in which I could give a statistical significance about > > this fact (yes, as always, there is the "I need a p-value" problem). > > > > 1) I did PCA on both beta values and m-values, and found that the > > first principal component accounts for between 90 and 91% of the > > total > > variance. In the biplot, the five samples appear to be very close. > > > > 2) I asked for advice to a statistician friend, and we tried to do > > the following: probe by probe, we tried a Leave-One-Out approach, by > > calculating a confidence interval for 4 of the samples and seeing if > > the remaining probe falls within the interval. Then, for each probe, > > I > > summed the number of times a methylation value fell out of the > > confInt, only to find out that nearly 53% of the probes contain -in > > this sense- 'outliers'. > > > > At first it surprised me, but then I noticed -by plotting the > > outliers against the samples- that these 'outliers' were uniformly > > distributed among samples, which I think is again intuitive, i.e., > > there are indeed differences (statistical differences, maybe not > > biological) among samples, but there is no global difference of one > > of > > the samples w.r.t. the others. > > > > These differences might be related to technical noise, so I was > > thinking of imposing a minimum difference in order to test again for > > outliers. Would this be ok? > > > > Is there any method I can use to try to show there is no difference > > among the samples? Or should I stay with the graphs and the intuitive > > description on the text? > > > > Thanks. Any help or hint would be much appreciated. > > > > Regards, > > Gustavo > > > > --------------------------- > > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.4 years ago Gustavo Fernández Bayón ▴ 440

0

Entering edit mode

There are several procedures, I'm not sure which is best. Take a look at Chapter 7 of Testing Statistical Hypotheses of Equivalence and Noninferiority, Second Edition by Stefan Wellek CRC/Chapman & Hall albyn On Tue, Jul 24, 2012 at 09:02:52AM +0200, Gustavo Fern?ndez Bay?n wrote: > Hi Albyn. > > As I have already answered to W. Huber in this thread, I think that TOST could be a good choice here. I think I am going to give it a try. Problem is, I am comfortable with the two sample description of TOST but, what about multiple groups? Should I do a similar procedure to the one I was doing before? That is, leave-one-out methods for each of the probes, but doing TOST's instead of common tests? > > Thanks for your answer. > > Regards, > Gus > > > > --------------------------- > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) > > > El lunes 23 de julio de 2012 a las 17:29, jones escribi?: > > > You might look into "tests of equivalence", one common procedure > > involves defining an interval (-a,a) and doing two one sided > > tests ("TOST") for H_0: delta > a and delta < -a, which is equivalent > > to checking that the CI for the difference is contained in > > the specified interval. > > > > albyn > > > > On 7/23/12 12:52 AM, Gustavo Fern?ndez Bay?n wrote: > > > Hi everybody. > > > > > > I have a set of only 5 samples of Illumina27k methylation data. We > > > have extracted some info from the probes, but now the researcher in > > > charge of the project wants something that could support his idea of > > > the five samples to be practically equivalent wrt to their > > > methylation > > > levels. > > > > > > I know that the sample is quite small. Intuitively, if you plot > > > densities from the 5 samples, they are almost equal. Problem is, I do > > > not know a way in which I could give a statistical significance about > > > this fact (yes, as always, there is the "I need a p-value" problem). > > > > > > 1) I did PCA on both beta values and m-values, and found that the > > > first principal component accounts for between 90 and 91% of the > > > total > > > variance. In the biplot, the five samples appear to be very close. > > > > > > 2) I asked for advice to a statistician friend, and we tried to do > > > the following: probe by probe, we tried a Leave-One-Out approach, by > > > calculating a confidence interval for 4 of the samples and seeing if > > > the remaining probe falls within the interval. Then, for each probe, > > > I > > > summed the number of times a methylation value fell out of the > > > confInt, only to find out that nearly 53% of the probes contain -in > > > this sense- 'outliers'. > > > > > > At first it surprised me, but then I noticed -by plotting the > > > outliers against the samples- that these 'outliers' were uniformly > > > distributed among samples, which I think is again intuitive, i.e., > > > there are indeed differences (statistical differences, maybe not > > > biological) among samples, but there is no global difference of one > > > of > > > the samples w.r.t. the others. > > > > > > These differences might be related to technical noise, so I was > > > thinking of imposing a minimum difference in order to test again for > > > outliers. Would this be ok? > > > > > > Is there any method I can use to try to show there is no difference > > > among the samples? Or should I stay with the graphs and the intuitive > > > description on the text? > > > > > > Thanks. Any help or hint would be much appreciated. > > > > > > Regards, > > > Gustavo > > > > > > --------------------------- > > > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- Albyn Jones Reed College jones at reed.edu

ADD REPLY • link 13.4 years ago Albyn Jones ▴ 70

0

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 12 weeks ago

EMBL European Molecular Biology Laborat…

Gustavo, it seems that your question can be rephrased as 'there is no evidence for these 5 samples forming any (nontrivial, i.e. different from size 1 or 5) clusters'. If so, have a look at the package 'clue': http://cran.r-project.org/web/packages/clue/vignettes/clue.pdf Of course, proving the absence of something (e.g., a systematic difference) is very difficult, and in your case as in most it's probably better to aim for saying that any difference that may exist is smaller than some (more or less arbitrary) measure. Best wishes Wolfgang Jul/23/12 9:52 AM, Gustavo Fern?ndez Bay?n scripsit:: > Hi everybody. > > I have a set of only 5 samples of Illumina27k methylation data. We have extracted some info from the probes, but now the researcher in charge of the project wants something that could support his idea of the five samples to be practically equivalent wrt to their methylation levels. > > I know that the sample is quite small. Intuitively, if you plot densities from the 5 samples, they are almost equal. Problem is, I do not know a way in which I could give a statistical significance about this fact (yes, as always, there is the "I need a p-value" problem). > > 1) I did PCA on both beta values and m-values, and found that the first principal component accounts for between 90 and 91% of the total variance. In the biplot, the five samples appear to be very close. > > 2) I asked for advice to a statistician friend, and we tried to do the following: probe by probe, we tried a Leave-One-Out approach, by calculating a confidence interval for 4 of the samples and seeing if the remaining probe falls within the interval. Then, for each probe, I summed the number of times a methylation value fell out of the confInt, only to find out that nearly 53% of the probes contain -in this sense- 'outliers'. > > At first it surprised me, but then I noticed -by plotting the outliers against the samples- that these 'outliers' were uniformly distributed among samples, which I think is again intuitive, i.e., there are indeed differences (statistical differences, maybe not biological) among samples, but there is no global difference of one of the samples w.r.t. the others. > > These differences might be related to technical noise, so I was thinking of imposing a minimum difference in order to test again for outliers. Would this be ok? > > Is there any method I can use to try to show there is no difference among the samples? Or should I stay with the graphs and the intuitive description on the text? > > Thanks. Any help or hint would be much appreciated. > > Regards, > Gustavo > > --------------------------- > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD COMMENT • link 13.4 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Btw, a less complex way to approach such an analysis is highlighted here: http://nsaunders.wordpress.com/2012/07/23/we-really-dont-care-what- statistical-method-you-used/ Best wishes Wolfgang Jul/23/12 5:09 PM, Wolfgang Huber scripsit:: > Gustavo, > > it seems that your question can be rephrased as 'there is no evidence > for these 5 samples forming any (nontrivial, i.e. different from size 1 > or 5) clusters'. If so, have a look at the package 'clue': > http://cran.r-project.org/web/packages/clue/vignettes/clue.pdf > > Of course, proving the absence of something (e.g., a systematic > difference) is very difficult, and in your case as in most it's probably > better to aim for saying that any difference that may exist is smaller > than some (more or less arbitrary) measure. > > Best wishes > Wolfgang > > Jul/23/12 9:52 AM, Gustavo Fern?ndez Bay?n scripsit:: >> Hi everybody. >> >> I have a set of only 5 samples of Illumina27k methylation data. We >> have extracted some info from the probes, but now the researcher in >> charge of the project wants something that could support his idea of >> the five samples to be practically equivalent wrt to their methylation >> levels. >> >> I know that the sample is quite small. Intuitively, if you plot >> densities from the 5 samples, they are almost equal. Problem is, I do >> not know a way in which I could give a statistical significance about >> this fact (yes, as always, there is the "I need a p-value" problem). >> >> 1) I did PCA on both beta values and m-values, and found that the >> first principal component accounts for between 90 and 91% of the total >> variance. In the biplot, the five samples appear to be very close. >> >> 2) I asked for advice to a statistician friend, and we tried to do the >> following: probe by probe, we tried a Leave-One-Out approach, by >> calculating a confidence interval for 4 of the samples and seeing if >> the remaining probe falls within the interval. Then, for each probe, I >> summed the number of times a methylation value fell out of the >> confInt, only to find out that nearly 53% of the probes contain -in >> this sense- 'outliers'. >> >> At first it surprised me, but then I noticed -by plotting the outliers >> against the samples- that these 'outliers' were uniformly distributed >> among samples, which I think is again intuitive, i.e., there are >> indeed differences (statistical differences, maybe not biological) >> among samples, but there is no global difference of one of the samples >> w.r.t. the others. >> >> These differences might be related to technical noise, so I was >> thinking of imposing a minimum difference in order to test again for >> outliers. Would this be ok? >> >> Is there any method I can use to try to show there is no difference >> among the samples? Or should I stay with the graphs and the intuitive >> description on the text? >> >> Thanks. Any help or hint would be much appreciated. >> >> Regards, >> Gustavo >> >> --------------------------- >> Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD REPLY • link 13.4 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

I am not an expert of statistic. Why not use ANOVA? Jack 2012/7/23 Wolfgang Huber <whuber@embl.de> > > Btw, a less complex way to approach such an analysis is highlighted here: > > http://nsaunders.wordpress.**com/2012/07/23/we-really-dont-** > care-what-statistical-method-**you- used/<http: nsaunders.wordpress.com="" 2012="" 07="" 23="" we-really-dont-care-="" what-statistical-method-you-used=""/> > > Best wishes > Wolfgang > > Jul/23/12 5:09 PM, Wolfgang Huber scripsit:: > > Gustavo, >> >> it seems that your question can be rephrased as 'there is no evidence >> for these 5 samples forming any (nontrivial, i.e. different from size 1 >> or 5) clusters'. If so, have a look at the package 'clue': >> http://cran.r-project.org/web/**packages/clue/vignettes/clue.**pdf< http://cran.r-project.org/web/packages/clue/vignettes/clue.pdf> >> >> Of course, proving the absence of something (e.g., a systematic >> difference) is very difficult, and in your case as in most it's probably >> better to aim for saying that any difference that may exist is smaller >> than some (more or less arbitrary) measure. >> >> Best wishes >> Wolfgang >> >> Jul/23/12 9:52 AM, Gustavo Fernández Bayón scripsit:: >> >>> Hi everybody. >>> >>> I have a set of only 5 samples of Illumina27k methylation data. We >>> have extracted some info from the probes, but now the researcher in >>> charge of the project wants something that could support his idea of >>> the five samples to be practically equivalent wrt to their methylation >>> levels. >>> >>> I know that the sample is quite small. Intuitively, if you plot >>> densities from the 5 samples, they are almost equal. Problem is, I do >>> not know a way in which I could give a statistical significance about >>> this fact (yes, as always, there is the "I need a p-value" problem). >>> >>> 1) I did PCA on both beta values and m-values, and found that the >>> first principal component accounts for between 90 and 91% of the total >>> variance. In the biplot, the five samples appear to be very close. >>> >>> 2) I asked for advice to a statistician friend, and we tried to do the >>> following: probe by probe, we tried a Leave-One-Out approach, by >>> calculating a confidence interval for 4 of the samples and seeing if >>> the remaining probe falls within the interval. Then, for each probe, I >>> summed the number of times a methylation value fell out of the >>> confInt, only to find out that nearly 53% of the probes contain -in >>> this sense- 'outliers'. >>> >>> At first it surprised me, but then I noticed -by plotting the outliers >>> against the samples- that these 'outliers' were uniformly distributed >>> among samples, which I think is again intuitive, i.e., there are >>> indeed differences (statistical differences, maybe not biological) >>> among samples, but there is no global difference of one of the samples >>> w.r.t. the others. >>> >>> These differences might be related to technical noise, so I was >>> thinking of imposing a minimum difference in order to test again for >>> outliers. Would this be ok? >>> >>> Is there any method I can use to try to show there is no difference >>> among the samples? Or should I stay with the graphs and the intuitive >>> description on the text? >>> >>> Thanks. Any help or hint would be much appreciated. >>> >>> Regards, >>> Gustavo >>> >>> --------------------------- >>> Enviado con Sparrow (http://www.sparrowmailapp.**com/?sig<http: w="" ww.sparrowmailapp.com="" ?sig=""> >>> ) >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: >>> http://news.gmane.org/gmane.**science.biology.informatics.**conduc tor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>> >>> >> >> > > -- > Best wishes > Wolfgang > > Wolfgang Huber > EMBL > http://www.embl.de/research/**units/genome_biology/huber<http: www.="" embl.de="" research="" units="" genome_biology="" huber=""> > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]

ADD REPLY • link 13.4 years ago yao chen ▴ 210

0

Entering edit mode

Hi Jack. I am not an expert in statistics, either, but I found this little paper to give me a simple hint on the drawbacks of ANOVA for equivalence testing. http://pareonline.net/pdf/v16n7.pdf Regards, Gus --------------------------- Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) El lunes 23 de julio de 2012 a las 20:02, Yao Chen escribi?: > I am not an expert of statistic. Why not use ANOVA? > > Jack > > 2012/7/23 Wolfgang Huber <whuber at="" embl.de="" (mailto:whuber="" at="" embl.de)=""> > > > > > Btw, a less complex way to approach such an analysis is highlighted here: > > > > http://nsaunders.wordpress.**com/2012/07/23/we-really-dont-** > > care-what-statistical-method-**you- used/<http: nsaunders.wordpress.com="" 2012="" 07="" 23="" we-really-dont-care-="" what-statistical-method-you-used=""/> > > > > Best wishes > > Wolfgang > > > > Jul/23/12 5:09 PM, Wolfgang Huber scripsit:: > > > > Gustavo, > > > > > > it seems that your question can be rephrased as 'there is no evidence > > > for these 5 samples forming any (nontrivial, i.e. different from size 1 > > > or 5) clusters'. If so, have a look at the package 'clue': > > > http://cran.r-project.org/web/**packages/clue/vignettes/clue.**p df<http: cran.r-project.org="" web="" packages="" clue="" vignettes="" clue.pdf=""> > > > > > > Of course, proving the absence of something (e.g., a systematic > > > difference) is very difficult, and in your case as in most it's probably > > > better to aim for saying that any difference that may exist is smaller > > > than some (more or less arbitrary) measure. > > > > > > Best wishes > > > Wolfgang > > > > > > Jul/23/12 9:52 AM, Gustavo Fern?ndez Bay?n scripsit:: > > > > > > > Hi everybody. > > > > > > > > I have a set of only 5 samples of Illumina27k methylation data. We > > > > have extracted some info from the probes, but now the researcher in > > > > charge of the project wants something that could support his idea of > > > > the five samples to be practically equivalent wrt to their methylation > > > > levels. > > > > > > > > I know that the sample is quite small. Intuitively, if you plot > > > > densities from the 5 samples, they are almost equal. Problem is, I do > > > > not know a way in which I could give a statistical significance about > > > > this fact (yes, as always, there is the "I need a p-value" problem). > > > > > > > > 1) I did PCA on both beta values and m-values, and found that the > > > > first principal component accounts for between 90 and 91% of the total > > > > variance. In the biplot, the five samples appear to be very close. > > > > > > > > 2) I asked for advice to a statistician friend, and we tried to do the > > > > following: probe by probe, we tried a Leave-One-Out approach, by > > > > calculating a confidence interval for 4 of the samples and seeing if > > > > the remaining probe falls within the interval. Then, for each probe, I > > > > summed the number of times a methylation value fell out of the > > > > confInt, only to find out that nearly 53% of the probes contain -in > > > > this sense- 'outliers'. > > > > > > > > At first it surprised me, but then I noticed -by plotting the outliers > > > > against the samples- that these 'outliers' were uniformly distributed > > > > among samples, which I think is again intuitive, i.e., there are > > > > indeed differences (statistical differences, maybe not biological) > > > > among samples, but there is no global difference of one of the samples > > > > w.r.t. the others. > > > > > > > > These differences might be related to technical noise, so I was > > > > thinking of imposing a minimum difference in order to test again for > > > > outliers. Would this be ok? > > > > > > > > Is there any method I can use to try to show there is no difference > > > > among the samples? Or should I stay with the graphs and the intuitive > > > > description on the text? > > > > > > > > Thanks. Any help or hint would be much appreciated. > > > > > > > > Regards, > > > > Gustavo > > > > > > > > --------------------------- > > > > Enviado con Sparrow (http://www.sparrowmailapp.**com/?sig<http :="" www.sparrowmailapp.com="" ?sig=""> > > > > ) > > > > > > > > ______________________________**_________________ > > > > Bioconductor mailing list > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: s="" tat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > > > > Search the archives: > > > > http://news.gmane.org/gmane.**science.biology.informatics.**co nductor<http: news.gmane.org="" gmane.science.biology.informatics.conduc="" tor=""> > > > > > > > > > > > -- > > Best wishes > > Wolfgang > > > > Wolfgang Huber > > EMBL > > http://www.embl.de/research/**units/genome_biology/huber<http: ww="" w.embl.de="" research="" units="" genome_biology="" huber=""> > > > > ______________________________**_________________ > > Bioconductor mailing list > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> > > Search the archives: http://news.gmane.org/gmane.** > > science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor=""> > > > > [[alternative HTML version deleted]] > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.4 years ago Gustavo Fernández Bayón ▴ 440

0

Entering edit mode

Hi Wolfgang. I had already seen that paper, thanks to @yokofakun in twitter. How could that happen? Is it a manuscript? Although I have seen strange things in the past, this has really hit me. A fellow here at the lab showed me yesterday a manuscript where the chromatin inmunoprecipitation step was described with a figure drawn at hand with a pencil. At least I should say that the guy knew how to draw a histone. ;) Regards, Gus --------------------------- Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) El lunes 23 de julio de 2012 a las 17:31, Wolfgang Huber escribi?: > > Btw, a less complex way to approach such an analysis is highlighted here: > > http://nsaunders.wordpress.com/2012/07/23/we-really-dont-care-what- statistical-method-you-used/ > > Best wishes > Wolfgang > > Jul/23/12 5:09 PM, Wolfgang Huber scripsit:: > > Gustavo, > > > > it seems that your question can be rephrased as 'there is no evidence > > for these 5 samples forming any (nontrivial, i.e. different from size 1 > > or 5) clusters'. If so, have a look at the package 'clue': > > http://cran.r-project.org/web/packages/clue/vignettes/clue.pdf > > > > Of course, proving the absence of something (e.g., a systematic > > difference) is very difficult, and in your case as in most it's probably > > better to aim for saying that any difference that may exist is smaller > > than some (more or less arbitrary) measure. > > > > Best wishes > > Wolfgang > > > > Jul/23/12 9:52 AM, Gustavo Fern?ndez Bay?n scripsit:: > > > Hi everybody. > > > > > > I have a set of only 5 samples of Illumina27k methylation data. We > > > have extracted some info from the probes, but now the researcher in > > > charge of the project wants something that could support his idea of > > > the five samples to be practically equivalent wrt to their methylation > > > levels. > > > > > > I know that the sample is quite small. Intuitively, if you plot > > > densities from the 5 samples, they are almost equal. Problem is, I do > > > not know a way in which I could give a statistical significance about > > > this fact (yes, as always, there is the "I need a p-value" problem). > > > > > > 1) I did PCA on both beta values and m-values, and found that the > > > first principal component accounts for between 90 and 91% of the total > > > variance. In the biplot, the five samples appear to be very close. > > > > > > 2) I asked for advice to a statistician friend, and we tried to do the > > > following: probe by probe, we tried a Leave-One-Out approach, by > > > calculating a confidence interval for 4 of the samples and seeing if > > > the remaining probe falls within the interval. Then, for each probe, I > > > summed the number of times a methylation value fell out of the > > > confInt, only to find out that nearly 53% of the probes contain -in > > > this sense- 'outliers'. > > > > > > At first it surprised me, but then I noticed -by plotting the outliers > > > against the samples- that these 'outliers' were uniformly distributed > > > among samples, which I think is again intuitive, i.e., there are > > > indeed differences (statistical differences, maybe not biological) > > > among samples, but there is no global difference of one of the samples > > > w.r.t. the others. > > > > > > These differences might be related to technical noise, so I was > > > thinking of imposing a minimum difference in order to test again for > > > outliers. Would this be ok? > > > > > > Is there any method I can use to try to show there is no difference > > > among the samples? Or should I stay with the graphs and the intuitive > > > description on the text? > > > > > > Thanks. Any help or hint would be much appreciated. > > > > > > Regards, > > > Gustavo > > > > > > --------------------------- > > > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > -- > Best wishes > Wolfgang > > Wolfgang Huber > EMBL > http://www.embl.de/research/units/genome_biology/huber > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.4 years ago Gustavo Fernández Bayón ▴ 440

0

Entering edit mode

Hi Wolfgang. --------------------------- Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) El lunes 23 de julio de 2012 a las 17:09, Wolfgang Huber escribi?: > Gustavo, > > it seems that your question can be rephrased as 'there is no evidence > for these 5 samples forming any (nontrivial, i.e. different from size 1 > or 5) clusters'. If so, have a look at the package 'clue': > http://cran.r-project.org/web/packages/clue/vignettes/clue.pdf I have just had a look at it. Thanks for the link. I did not even know that cluster ensembles existed. Sometimes, it is difficult to just stay up to date with all the methods that are available. That's why I find these conversations in the BioC list so enriching. At the beginning, that was one of the ideas I was looking for, i.e., to prove that there was not a clear way of separating the 5 samples in a general way. > Of course, proving the absence of something (e.g., a systematic > difference) is very difficult, and in your case as in most it's probably > better to aim for saying that any difference that may exist is smaller > than some (more or less arbitrary) measure. That seems very appropriate for me. That is related to the TOST method, that Albyn Jones has pointed in one of the other answers, isn't it? I know is kind of double testing with two one-tail tests, and that it takes a parameter stating the amount of difference we are willing to admit before we say that the individuals are not equivalent. I have to admit this is one of the ideas I think I am more comfortable with. Maybe I can give it a try on my data, and then write it in a professional way, that is, just like if I knew what I was talking about. ;) > > Best wishes > Wolfgang Thanks for your answer. Regards, Gus > > Jul/23/12 9:52 AM, Gustavo Fern?ndez Bay?n scripsit:: > > Hi everybody. > > > > I have a set of only 5 samples of Illumina27k methylation data. We have extracted some info from the probes, but now the researcher in charge of the project wants something that could support his idea of the five samples to be practically equivalent wrt to their methylation levels. > > > > I know that the sample is quite small. Intuitively, if you plot densities from the 5 samples, they are almost equal. Problem is, I do not know a way in which I could give a statistical significance about this fact (yes, as always, there is the "I need a p-value" problem). > > > > 1) I did PCA on both beta values and m-values, and found that the first principal component accounts for between 90 and 91% of the total variance. In the biplot, the five samples appear to be very close. > > > > 2) I asked for advice to a statistician friend, and we tried to do the following: probe by probe, we tried a Leave-One-Out approach, by calculating a confidence interval for 4 of the samples and seeing if the remaining probe falls within the interval. Then, for each probe, I summed the number of times a methylation value fell out of the confInt, only to find out that nearly 53% of the probes contain -in this sense- 'outliers'. > > > > At first it surprised me, but then I noticed -by plotting the outliers against the samples- that these 'outliers' were uniformly distributed among samples, which I think is again intuitive, i.e., there are indeed differences (statistical differences, maybe not biological) among samples, but there is no global difference of one of the samples w.r.t. the others. > > > > These differences might be related to technical noise, so I was thinking of imposing a minimum difference in order to test again for outliers. Would this be ok? > > > > Is there any method I can use to try to show there is no difference among the samples? Or should I stay with the graphs and the intuitive description on the text? > > > > Thanks. Any help or hint would be much appreciated. > > > > Regards, > > Gustavo > > > > --------------------------- > > Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > -- > Best wishes > Wolfgang > > Wolfgang Huber > EMBL > http://www.embl.de/research/units/genome_biology/huber > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 13.4 years ago Gustavo Fernández Bayón ▴ 440

Login before adding your answer.