Question: Variance stabilization of m-values
0
6.8 years ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:
Use eBayes with trend=TRUE later in the pipeline, then variance stabilization may not be needed. Gordon > Date: Wed, 1 Aug 2012 15:20:56 +0200 > From: Gustavo Fern?ndez Bay?n <gbayon at="" gmail.com=""> > To: bioconductor at r-project.org > Subject: [BioC] Variance stabilization of m-values > > Hi everybody. > > I am working with Illumina 450k methylation data. I am currently > cleaning a data set, getting rid of XY probes, etc? and I would like to > do a non-specific filtering and preserve only 20% of the probes, those > with the higher variability (as seen in Chapter 7 of the Bioconductor > Case Studies book). > > In the book, they create a meanSdPlot() and proceed as the variance is > not dependent on the mean (to a significant degree). > > Trying to follow that procedure, I have converted my beta values to > M-values, and then called meanSdPlot(). It shows, for my data, that > there is a relationship between mean and variance, i.e. the line with > the median is not horizontal. Of course, if I create a meanSdPlot with > the beta values, the effect is greater, due to their heteroscedasticity. > > Question: Is it correct to use a variance stabilization transformation > (as the one in justvsn) on the M-values in order to discard low- variance > probes? > > Any hint will be much appreciated. > > Regards, > Gus ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
• 685 views
modified 6.8 years ago by Brent Pedersen90 • written 6.8 years ago by Gordon Smyth37k
0
6.8 years ago by
Tim Triche4.2k
United States
Tim Triche4.2k wrote:
The mean-variance plot should be far "more" horizontal with M-values than beta-values; have you plotted it against total intensity? You end up going down the rabbit hole eventually due to copy number variation, but plotting m-value variance against the mean, the line of best fit is nearly flat across the range of values. The variance is more U-shaped (as opposed to the "n" shape with beta values). You could try an arcsin transform asin(sqrt(beta))) if your primary goal is to stabilize the variance, though Dr. Smyth's suggestion will probably be better for sensitivity in the end. Just a thought. There are many ways to transform a proportion and they all have relative strengths and weaknesses in practice. On Thu, Aug 2, 2012 at 4:19 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Use eBayes with trend=TRUE later in the pipeline, then variance > stabilization may not be needed. > > Gordon > > Date: Wed, 1 Aug 2012 15:20:56 +0200 >> From: Gustavo Fern?ndez Bay?n <gbayon@gmail.com> >> To: bioconductor@r-project.org >> Subject: [BioC] Variance stabilization of m-values >> >> Hi everybody. >> >> I am working with Illumina 450k methylation data. I am currently cleaning >> a data set, getting rid of XY probes, etc? and I would like to do a >> non-specific filtering and preserve only 20% of the probes, those with the >> higher variability (as seen in Chapter 7 of the Bioconductor Case Studies >> book). >> >> In the book, they create a meanSdPlot() and proceed as the variance is >> not dependent on the mean (to a significant degree). >> >> Trying to follow that procedure, I have converted my beta values to >> M-values, and then called meanSdPlot(). It shows, for my data, that there >> is a relationship between mean and variance, i.e. the line with the median >> is not horizontal. Of course, if I create a meanSdPlot with the beta >> values, the effect is greater, due to their heteroscedasticity. >> >> Question: Is it correct to use a variance stabilization transformation >> (as the one in justvsn) on the M-values in order to discard low- variance >> probes? >> >> Any hint will be much appreciated. >> >> Regards, >> Gus >> > > ______________________________**______________________________**____ ______ > The information in this email is confidential and inte...{{dropped:21}}
nb. I should have written: "the variance of the M-value variance as a function of the mean is more U-shaped towards the extremes, versus the n shape for betas" My apologies. --t On Thu, Aug 2, 2012 at 7:16 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > The mean-variance plot should be far "more" horizontal with M-values than > beta-values; have you plotted it against total intensity? You end up going > down the rabbit hole eventually due to copy number variation, but plotting > m-value variance against the mean, the line of best fit is nearly flat > across the range of values. The variance is more U-shaped (as opposed to > the "n" shape with beta values). > > You could try an arcsin transform > > asin(sqrt(beta))) > > if your primary goal is to stabilize the variance, though Dr. Smyth's > suggestion will probably be better for sensitivity in the end. > > Just a thought. There are many ways to transform a proportion and they > all have relative strengths and weaknesses in practice. > > > > On Thu, Aug 2, 2012 at 4:19 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > >> Use eBayes with trend=TRUE later in the pipeline, then variance >> stabilization may not be needed. >> >> Gordon >> >> Date: Wed, 1 Aug 2012 15:20:56 +0200 >>> From: Gustavo Fern?ndez Bay?n <gbayon@gmail.com> >>> To: bioconductor@r-project.org >>> Subject: [BioC] Variance stabilization of m-values >>> >>> Hi everybody. >>> >>> I am working with Illumina 450k methylation data. I am currently >>> cleaning a data set, getting rid of XY probes, etc? and I would like to do >>> a non-specific filtering and preserve only 20% of the probes, those with >>> the higher variability (as seen in Chapter 7 of the Bioconductor Case >>> Studies book). >>> >>> In the book, they create a meanSdPlot() and proceed as the variance is >>> not dependent on the mean (to a significant degree). >>> >>> Trying to follow that procedure, I have converted my beta values to >>> M-values, and then called meanSdPlot(). It shows, for my data, that there >>> is a relationship between mean and variance, i.e. the line with the median >>> is not horizontal. Of course, if I create a meanSdPlot with the beta >>> values, the effect is greater, due to their heteroscedasticity. >>> >>> Question: Is it correct to use a variance stabilization transformation >>> (as the one in justvsn) on the M-values in order to discard low- variance >>> probes? >>> >>> Any hint will be much appreciated. >>> >>> Regards, >>> Gus >>> >> >> ______________________________**______________________________** >> __________ >> The information in this email is confidential and intend...{{dropped:4}} >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
Hi Tim. Sorry for the late reply. (OFFTOPIC: my third child decided to be born :) the day after I asked the question in the list, so I have been on paternal leave, and really had no time to answer the emails) The arcsin proposal is very interesting. I'll give a try too, although, as I have answered to Dr. Smyth, I do not exactly know if the curve is really important as I thought it was the first time. I am currently re-working on that pipeline, because I have to remember the exact point where I was twenty days before, and that is sometimes hard :) Thank you very much for your hints Regards, Gus --------------------------- Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) El viernes 3 de agosto de 2012 a las 04:16, Tim Triche, Jr. escribi?: > The mean-variance plot should be far "more" horizontal with M-values than beta-values; have you plotted it against total intensity? You end up going down the rabbit hole eventually due to copy number variation, but plotting m-value variance against the mean, the line of best fit is nearly flat across the range of values. The variance is more U-shaped (as opposed to the "n" shape with beta values). > > You could try an arcsin transform > > asin(sqrt(beta))) > > if your primary goal is to stabilize the variance, though Dr. Smyth's suggestion will probably be better for sensitivity in the end. > > Just a thought. There are many ways to transform a proportion and they all have relative strengths and weaknesses in practice. > > > > On Thu, Aug 2, 2012 at 4:19 PM, Gordon K Smyth <smyth at="" wehi.edu.au="" (mailto:smyth="" at="" wehi.edu.au)=""> wrote: > > Use eBayes with trend=TRUE later in the pipeline, then variance stabilization may not be needed. > > > > Gordon > > > > > Date: Wed, 1 Aug 2012 15:20:56 +0200 > > > From: Gustavo Fern?ndez Bay?n <gbayon at="" gmail.com="" (mailto:gbayon="" at="" gmail.com)=""> > > > To: bioconductor at r-project.org (mailto:bioconductor at r-project.org) > > > Subject: [BioC] Variance stabilization of m-values > > > > > > Hi everybody. > > > > > > I am working with Illumina 450k methylation data. I am currently cleaning a data set, getting rid of XY probes, etc? and I would like to do a non-specific filtering and preserve only 20% of the probes, those with the higher variability (as seen in Chapter 7 of the Bioconductor Case Studies book). > > > > > > In the book, they create a meanSdPlot() and proceed as the variance is not dependent on the mean (to a significant degree). > > > > > > Trying to follow that procedure, I have converted my beta values to M-values, and then called meanSdPlot(). It shows, for my data, that there is a relationship between mean and variance, i.e. the line with the median is not horizontal. Of course, if I create a meanSdPlot with the beta values, the effect is greater, due to their heteroscedasticity. > > > > > > Question: Is it correct to use a variance stabilization transformation (as the one in justvsn) on the M-values in order to discard low-variance probes? > > > > > > Any hint will be much appreciated. > > > > > > Regards, > > > Gus > > > > > > ______________________________________________________________________ > > The information in this email is confidential and intend...{{dropped:4}} > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > -- > A model is a lie that helps you see the truth. > > Howard Skipper (http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf)
0
6.8 years ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:
Dear Gustavo the two issues: - whether filtering of probes by overall variance is admissible and helpful for your analysis - whether the variance depends on the mean are unrelated. If I understand your question correctly (and I am not sure I do), then you should filter on the overall variance of the M values, and need not worry about the mean-variance relationship. Can you check the paper on this topic ("Independent filtering increases detection power for high-throughput experiments", http://www.pnas.org/content/107/21/9546.long) and get back if it is still unclear? Best wishes Wolfgang Aug/3/12 1:19 AM, Gordon K Smyth scripsit:: > Use eBayes with trend=TRUE later in the pipeline, then variance > stabilization may not be needed. > > Gordon > >> Date: Wed, 1 Aug 2012 15:20:56 +0200 >> From: Gustavo Fern?ndez Bay?n <gbayon at="" gmail.com=""> >> To: bioconductor at r-project.org >> Subject: [BioC] Variance stabilization of m-values >> >> Hi everybody. >> >> I am working with Illumina 450k methylation data. I am currently >> cleaning a data set, getting rid of XY probes, etc? and I would like >> to do a non-specific filtering and preserve only 20% of the probes, >> those with the higher variability (as seen in Chapter 7 of the >> Bioconductor Case Studies book). >> >> In the book, they create a meanSdPlot() and proceed as the variance is >> not dependent on the mean (to a significant degree). >> >> Trying to follow that procedure, I have converted my beta values to >> M-values, and then called meanSdPlot(). It shows, for my data, that >> there is a relationship between mean and variance, i.e. the line with >> the median is not horizontal. Of course, if I create a meanSdPlot with >> the beta values, the effect is greater, due to their heteroscedasticity. >> >> Question: Is it correct to use a variance stabilization transformation >> (as the one in justvsn) on the M-values in order to discard >> low-variance probes? >> >> Any hint will be much appreciated. >> >> Regards, >> Gus > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:21}}
0
6.8 years ago by
Spain
Hi Gordon. Sorry for the late reply. I'll try your solution and see if it works. Fact is, maybe I was too alarmed about the graph, and the relationship is not that important. Thank you very much. Gus --------------------------- Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) El viernes 3 de agosto de 2012 a las 01:19, Gordon K Smyth escribi?: > Use eBayes with trend=TRUE later in the pipeline, then variance > stabilization may not be needed. > > Gordon > > > Date: Wed, 1 Aug 2012 15:20:56 +0200 > > From: Gustavo Fern?ndez Bay?n <gbayon at="" gmail.com="" (mailto:gbayon="" at="" gmail.com)=""> > > To: bioconductor at r-project.org (mailto:bioconductor at r-project.org) > > Subject: [BioC] Variance stabilization of m-values > > > > Hi everybody. > > > > I am working with Illumina 450k methylation data. I am currently > > cleaning a data set, getting rid of XY probes, etc? and I would like to > > do a non-specific filtering and preserve only 20% of the probes, those > > with the higher variability (as seen in Chapter 7 of the Bioconductor > > Case Studies book). > > > > In the book, they create a meanSdPlot() and proceed as the variance is > > not dependent on the mean (to a significant degree). > > > > Trying to follow that procedure, I have converted my beta values to > > M-values, and then called meanSdPlot(). It shows, for my data, that > > there is a relationship between mean and variance, i.e. the line with > > the median is not horizontal. Of course, if I create a meanSdPlot with > > the beta values, the effect is greater, due to their heteroscedasticity. > > > > Question: Is it correct to use a variance stabilization transformation > > (as the one in justvsn) on the M-values in order to discard low- variance > > probes? > > > > Any hint will be much appreciated. > > > > Regards, > > Gus > > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}
0
6.8 years ago by
United States
Brent Pedersen90 wrote:
On Thu, Aug 2, 2012 at 5:19 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > Use eBayes with trend=TRUE later in the pipeline, then variance > stabilization may not be needed. > > Gordon Is that recommendation only for beta values? when using M-values as a matrix, fit$Amean is not set so this gives an error when using eBayes with trend=TRUE. Or should one just manually set fit$Amean = rowMeans(M) ? thanks, -Brent
Dear Brent, No Amean <- rowMeans(M) wouldn't have the desired effect. Amean should reflect average intensity, so it would be necessary to compute Amean from the original intensities used to compute the M-values or beta values. Note that I don't have any first hand experience with methylation arrays, so this is just to suggest something that could be tried. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Fri, 24 Aug 2012, Brent Pedersen wrote: > On Thu, Aug 2, 2012 at 5:19 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> Use eBayes with trend=TRUE later in the pipeline, then variance >> stabilization may not be needed. >> >> Gordon > > Is that recommendation only for beta values? > > when using M-values as a matrix, fit$Amean is not set so this gives an > error when using eBayes with trend=TRUE. > > Or should one just manually set fit$Amean = rowMeans(M) ? > > thanks, > -Brent > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}