Variance stabilization of m-values
4
0
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia
Use eBayes with trend=TRUE later in the pipeline, then variance stabilization may not be needed. Gordon > Date: Wed, 1 Aug 2012 15:20:56 +0200 > From: Gustavo Fern?ndez Bay?n <gbayon at="" gmail.com=""> > To: bioconductor at r-project.org > Subject: [BioC] Variance stabilization of m-values > > Hi everybody. > > I am working with Illumina 450k methylation data. I am currently > cleaning a data set, getting rid of XY probes, etc? and I would like to > do a non-specific filtering and preserve only 20% of the probes, those > with the higher variability (as seen in Chapter 7 of the Bioconductor > Case Studies book). > > In the book, they create a meanSdPlot() and proceed as the variance is > not dependent on the mean (to a significant degree). > > Trying to follow that procedure, I have converted my beta values to > M-values, and then called meanSdPlot(). It shows, for my data, that > there is a relationship between mean and variance, i.e. the line with > the median is not horizontal. Of course, if I create a meanSdPlot with > the beta values, the effect is greater, due to their heteroscedasticity. > > Question: Is it correct to use a variance stabilization transformation > (as the one in justvsn) on the M-values in order to discard low- variance > probes? > > Any hint will be much appreciated. > > Regards, > Gus ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
• 1.6k views
ADD COMMENT
0
Entering edit mode
Tim Triche ★ 4.2k
@tim-triche-3561
Last seen 3.6 years ago
United States
The mean-variance plot should be far "more" horizontal with M-values than beta-values; have you plotted it against total intensity? You end up going down the rabbit hole eventually due to copy number variation, but plotting m-value variance against the mean, the line of best fit is nearly flat across the range of values. The variance is more U-shaped (as opposed to the "n" shape with beta values). You could try an arcsin transform asin(sqrt(beta))) if your primary goal is to stabilize the variance, though Dr. Smyth's suggestion will probably be better for sensitivity in the end. Just a thought. There are many ways to transform a proportion and they all have relative strengths and weaknesses in practice. On Thu, Aug 2, 2012 at 4:19 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > Use eBayes with trend=TRUE later in the pipeline, then variance > stabilization may not be needed. > > Gordon > > Date: Wed, 1 Aug 2012 15:20:56 +0200 >> From: Gustavo Fern?ndez Bay?n <gbayon@gmail.com> >> To: bioconductor@r-project.org >> Subject: [BioC] Variance stabilization of m-values >> >> Hi everybody. >> >> I am working with Illumina 450k methylation data. I am currently cleaning >> a data set, getting rid of XY probes, etc? and I would like to do a >> non-specific filtering and preserve only 20% of the probes, those with the >> higher variability (as seen in Chapter 7 of the Bioconductor Case Studies >> book). >> >> In the book, they create a meanSdPlot() and proceed as the variance is >> not dependent on the mean (to a significant degree). >> >> Trying to follow that procedure, I have converted my beta values to >> M-values, and then called meanSdPlot(). It shows, for my data, that there >> is a relationship between mean and variance, i.e. the line with the median >> is not horizontal. Of course, if I create a meanSdPlot with the beta >> values, the effect is greater, due to their heteroscedasticity. >> >> Question: Is it correct to use a variance stabilization transformation >> (as the one in justvsn) on the M-values in order to discard low- variance >> probes? >> >> Any hint will be much appreciated. >> >> Regards, >> Gus >> > > ______________________________**______________________________**____ ______ > The information in this email is confidential and inte...{{dropped:21}}
ADD COMMENT
0
Entering edit mode
nb. I should have written: "the variance of the M-value variance as a function of the mean is more U-shaped towards the extremes, versus the n shape for betas" My apologies. --t On Thu, Aug 2, 2012 at 7:16 PM, Tim Triche, Jr. <tim.triche@gmail.com>wrote: > The mean-variance plot should be far "more" horizontal with M-values than > beta-values; have you plotted it against total intensity? You end up going > down the rabbit hole eventually due to copy number variation, but plotting > m-value variance against the mean, the line of best fit is nearly flat > across the range of values. The variance is more U-shaped (as opposed to > the "n" shape with beta values). > > You could try an arcsin transform > > asin(sqrt(beta))) > > if your primary goal is to stabilize the variance, though Dr. Smyth's > suggestion will probably be better for sensitivity in the end. > > Just a thought. There are many ways to transform a proportion and they > all have relative strengths and weaknesses in practice. > > > > On Thu, Aug 2, 2012 at 4:19 PM, Gordon K Smyth <smyth@wehi.edu.au> wrote: > >> Use eBayes with trend=TRUE later in the pipeline, then variance >> stabilization may not be needed. >> >> Gordon >> >> Date: Wed, 1 Aug 2012 15:20:56 +0200 >>> From: Gustavo Fern?ndez Bay?n <gbayon@gmail.com> >>> To: bioconductor@r-project.org >>> Subject: [BioC] Variance stabilization of m-values >>> >>> Hi everybody. >>> >>> I am working with Illumina 450k methylation data. I am currently >>> cleaning a data set, getting rid of XY probes, etc? and I would like to do >>> a non-specific filtering and preserve only 20% of the probes, those with >>> the higher variability (as seen in Chapter 7 of the Bioconductor Case >>> Studies book). >>> >>> In the book, they create a meanSdPlot() and proceed as the variance is >>> not dependent on the mean (to a significant degree). >>> >>> Trying to follow that procedure, I have converted my beta values to >>> M-values, and then called meanSdPlot(). It shows, for my data, that there >>> is a relationship between mean and variance, i.e. the line with the median >>> is not horizontal. Of course, if I create a meanSdPlot with the beta >>> values, the effect is greater, due to their heteroscedasticity. >>> >>> Question: Is it correct to use a variance stabilization transformation >>> (as the one in justvsn) on the M-values in order to discard low- variance >>> probes? >>> >>> Any hint will be much appreciated. >>> >>> Regards, >>> Gus >>> >> >> ______________________________**______________________________** >> __________ >> The information in this email is confidential and intend...{{dropped:4}} >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > > > -- > *A model is a lie that helps you see the truth.* > * > * > Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> > > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Tim. Sorry for the late reply. (OFFTOPIC: my third child decided to be born :) the day after I asked the question in the list, so I have been on paternal leave, and really had no time to answer the emails) The arcsin proposal is very interesting. I'll give a try too, although, as I have answered to Dr. Smyth, I do not exactly know if the curve is really important as I thought it was the first time. I am currently re-working on that pipeline, because I have to remember the exact point where I was twenty days before, and that is sometimes hard :) Thank you very much for your hints Regards, Gus --------------------------- Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) El viernes 3 de agosto de 2012 a las 04:16, Tim Triche, Jr. escribi?: > The mean-variance plot should be far "more" horizontal with M-values than beta-values; have you plotted it against total intensity? You end up going down the rabbit hole eventually due to copy number variation, but plotting m-value variance against the mean, the line of best fit is nearly flat across the range of values. The variance is more U-shaped (as opposed to the "n" shape with beta values). > > You could try an arcsin transform > > asin(sqrt(beta))) > > if your primary goal is to stabilize the variance, though Dr. Smyth's suggestion will probably be better for sensitivity in the end. > > Just a thought. There are many ways to transform a proportion and they all have relative strengths and weaknesses in practice. > > > > On Thu, Aug 2, 2012 at 4:19 PM, Gordon K Smyth <smyth at="" wehi.edu.au="" (mailto:smyth="" at="" wehi.edu.au)=""> wrote: > > Use eBayes with trend=TRUE later in the pipeline, then variance stabilization may not be needed. > > > > Gordon > > > > > Date: Wed, 1 Aug 2012 15:20:56 +0200 > > > From: Gustavo Fern?ndez Bay?n <gbayon at="" gmail.com="" (mailto:gbayon="" at="" gmail.com)=""> > > > To: bioconductor at r-project.org (mailto:bioconductor at r-project.org) > > > Subject: [BioC] Variance stabilization of m-values > > > > > > Hi everybody. > > > > > > I am working with Illumina 450k methylation data. I am currently cleaning a data set, getting rid of XY probes, etc? and I would like to do a non-specific filtering and preserve only 20% of the probes, those with the higher variability (as seen in Chapter 7 of the Bioconductor Case Studies book). > > > > > > In the book, they create a meanSdPlot() and proceed as the variance is not dependent on the mean (to a significant degree). > > > > > > Trying to follow that procedure, I have converted my beta values to M-values, and then called meanSdPlot(). It shows, for my data, that there is a relationship between mean and variance, i.e. the line with the median is not horizontal. Of course, if I create a meanSdPlot with the beta values, the effect is greater, due to their heteroscedasticity. > > > > > > Question: Is it correct to use a variance stabilization transformation (as the one in justvsn) on the M-values in order to discard low-variance probes? > > > > > > Any hint will be much appreciated. > > > > > > Regards, > > > Gus > > > > > > ______________________________________________________________________ > > The information in this email is confidential and intend...{{dropped:4}} > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > -- > A model is a lie that helps you see the truth. > > Howard Skipper (http://cancerres.aacrjournals.org/content/31/9/1173.full.pdf)
ADD REPLY
0
Entering edit mode
@wolfgang-huber-3550
Last seen 18 days ago
EMBL European Molecular Biology Laborat…
Dear Gustavo the two issues: - whether filtering of probes by overall variance is admissible and helpful for your analysis - whether the variance depends on the mean are unrelated. If I understand your question correctly (and I am not sure I do), then you should filter on the overall variance of the M values, and need not worry about the mean-variance relationship. Can you check the paper on this topic ("Independent filtering increases detection power for high-throughput experiments", http://www.pnas.org/content/107/21/9546.long) and get back if it is still unclear? Best wishes Wolfgang Aug/3/12 1:19 AM, Gordon K Smyth scripsit:: > Use eBayes with trend=TRUE later in the pipeline, then variance > stabilization may not be needed. > > Gordon > >> Date: Wed, 1 Aug 2012 15:20:56 +0200 >> From: Gustavo Fern?ndez Bay?n <gbayon at="" gmail.com=""> >> To: bioconductor at r-project.org >> Subject: [BioC] Variance stabilization of m-values >> >> Hi everybody. >> >> I am working with Illumina 450k methylation data. I am currently >> cleaning a data set, getting rid of XY probes, etc? and I would like >> to do a non-specific filtering and preserve only 20% of the probes, >> those with the higher variability (as seen in Chapter 7 of the >> Bioconductor Case Studies book). >> >> In the book, they create a meanSdPlot() and proceed as the variance is >> not dependent on the mean (to a significant degree). >> >> Trying to follow that procedure, I have converted my beta values to >> M-values, and then called meanSdPlot(). It shows, for my data, that >> there is a relationship between mean and variance, i.e. the line with >> the median is not horizontal. Of course, if I create a meanSdPlot with >> the beta values, the effect is greater, due to their heteroscedasticity. >> >> Question: Is it correct to use a variance stabilization transformation >> (as the one in justvsn) on the M-values in order to discard >> low-variance probes? >> >> Any hint will be much appreciated. >> >> Regards, >> Gus > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:21}}
ADD COMMENT
0
Entering edit mode
Hi Wolfgang, First of all, I apologize for the late reply. As I have answered in a previous mail, there have been major reasons that have kept me away from the e-mail. --------------------------- Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) El s?bado 4 de agosto de 2012 a las 00:32, Wolfgang Huber escribi?: > Dear Gustavo > > the two issues: > - whether filtering of probes by overall variance is admissible and > helpful for your analysis > - whether the variance depends on the mean > are unrelated. If I understand your question correctly (and I am not > sure I do), then you should filter on the overall variance of the M > values, and need not worry about the mean-variance relationship. I was thinking about that, when I noticed that the curve showing that relationship really had nearly no influence on a filtering of that kind. I.e., if I want to get rid of those probes whose variance is low, those are quite homogenous in the graph behavior. Well, I should have to re-think this, as I currently have to re-create the pipeline. > > Can you check the paper on this topic ("Independent filtering increases > detection power for high-throughput experiments", > http://www.pnas.org/content/107/21/9546.long) and get back if it is > still unclear? > I'll give it a read. Thank you very much for the link. > > Best wishes > Wolfgang Thank you, as always, for your interesting hints and references. Regards, Gus > > > Aug/3/12 1:19 AM, Gordon K Smyth scripsit:: > > Use eBayes with trend=TRUE later in the pipeline, then variance > > stabilization may not be needed. > > > > Gordon > > > > > Date: Wed, 1 Aug 2012 15:20:56 +0200 > > > From: Gustavo Fern?ndez Bay?n <gbayon at="" gmail.com="" (mailto:gbayon="" at="" gmail.com)=""> > > > To: bioconductor at r-project.org (mailto:bioconductor at r-project.org) > > > Subject: [BioC] Variance stabilization of m-values > > > > > > Hi everybody. > > > > > > I am working with Illumina 450k methylation data. I am currently > > > cleaning a data set, getting rid of XY probes, etc? and I would like > > > to do a non-specific filtering and preserve only 20% of the probes, > > > those with the higher variability (as seen in Chapter 7 of the > > > Bioconductor Case Studies book). > > > > > > In the book, they create a meanSdPlot() and proceed as the variance is > > > not dependent on the mean (to a significant degree). > > > > > > Trying to follow that procedure, I have converted my beta values to > > > M-values, and then called meanSdPlot(). It shows, for my data, that > > > there is a relationship between mean and variance, i.e. the line with > > > the median is not horizontal. Of course, if I create a meanSdPlot with > > > the beta values, the effect is greater, due to their heteroscedasticity. > > > > > > Question: Is it correct to use a variance stabilization transformation > > > (as the one in justvsn) on the M-values in order to discard > > > low-variance probes? > > > > > > Any hint will be much appreciated. > > > > > > Regards, > > > Gus > > > > > > > > ______________________________________________________________________ > > The information in this email is confidential and inte...{{dropped:21}} > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
@gustavo-fernandez-bayon-5300
Last seen 8.3 years ago
Spain
Hi Gordon. Sorry for the late reply. I'll try your solution and see if it works. Fact is, maybe I was too alarmed about the graph, and the relationship is not that important. Thank you very much. Gus --------------------------- Enviado con Sparrow (http://www.sparrowmailapp.com/?sig) El viernes 3 de agosto de 2012 a las 01:19, Gordon K Smyth escribi?: > Use eBayes with trend=TRUE later in the pipeline, then variance > stabilization may not be needed. > > Gordon > > > Date: Wed, 1 Aug 2012 15:20:56 +0200 > > From: Gustavo Fern?ndez Bay?n <gbayon at="" gmail.com="" (mailto:gbayon="" at="" gmail.com)=""> > > To: bioconductor at r-project.org (mailto:bioconductor at r-project.org) > > Subject: [BioC] Variance stabilization of m-values > > > > Hi everybody. > > > > I am working with Illumina 450k methylation data. I am currently > > cleaning a data set, getting rid of XY probes, etc? and I would like to > > do a non-specific filtering and preserve only 20% of the probes, those > > with the higher variability (as seen in Chapter 7 of the Bioconductor > > Case Studies book). > > > > In the book, they create a meanSdPlot() and proceed as the variance is > > not dependent on the mean (to a significant degree). > > > > Trying to follow that procedure, I have converted my beta values to > > M-values, and then called meanSdPlot(). It shows, for my data, that > > there is a relationship between mean and variance, i.e. the line with > > the median is not horizontal. Of course, if I create a meanSdPlot with > > the beta values, the effect is greater, due to their heteroscedasticity. > > > > Question: Is it correct to use a variance stabilization transformation > > (as the one in justvsn) on the M-values in order to discard low- variance > > probes? > > > > Any hint will be much appreciated. > > > > Regards, > > Gus > > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:10}}
ADD COMMENT
0
Entering edit mode
@brent-pedersen-4815
Last seen 9.4 years ago
United States
On Thu, Aug 2, 2012 at 5:19 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: > Use eBayes with trend=TRUE later in the pipeline, then variance > stabilization may not be needed. > > Gordon Is that recommendation only for beta values? when using M-values as a matrix, fit$Amean is not set so this gives an error when using eBayes with trend=TRUE. Or should one just manually set fit$Amean = rowMeans(M) ? thanks, -Brent
ADD COMMENT
0
Entering edit mode
Dear Brent, No Amean <- rowMeans(M) wouldn't have the desired effect. Amean should reflect average intensity, so it would be necessary to compute Amean from the original intensities used to compute the M-values or beta values. Note that I don't have any first hand experience with methylation arrays, so this is just to suggest something that could be tried. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. http://www.statsci.org/smyth On Fri, 24 Aug 2012, Brent Pedersen wrote: > On Thu, Aug 2, 2012 at 5:19 PM, Gordon K Smyth <smyth at="" wehi.edu.au=""> wrote: >> Use eBayes with trend=TRUE later in the pipeline, then variance >> stabilization may not be needed. >> >> Gordon > > Is that recommendation only for beta values? > > when using M-values as a matrix, fit$Amean is not set so this gives an > error when using eBayes with trend=TRUE. > > Or should one just manually set fit$Amean = rowMeans(M) ? > > thanks, > -Brent > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}
ADD REPLY

Login before adding your answer.

Traffic: 1030 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6