Analyzing "differential variability" of methylation (and gene expression)

0

Entering edit mode

Simone ▴ 190

@simone-5854

Last seen 5.9 years ago

Hi! My question is more of a general style, nevertheless I hope someone can help. I am currently trying to analyze "differential variability" of gene expression and, above all, methylation data (Illumina microarray data: 27K and 450K BeadChip) in the context of aging, i.e. I would like to see if the variability of methylation increases (or decreases) for (healthy) individuals when they age. I would like to do this gene-wise, to see if and which genes show increased/decreased variability. Several studies already published in this context employ different methods for such kind of analyses: First of all there is the normal F-test. But since it requires data that does not depart from normality I think it is not applicable in my case. For one of my datasets (~ 500 samples after outlier removal) I performed Shapiro-Wilk tests for the ~ 27.000 CpGs and found that more than 26.200 CpGs do not have normally distributed values (FDR 0.05). I think this is an usual observation when working with methylation data. In other analyses investigating similar questions Bartlett's test was employed. But it would require normal distributions as well. I also read something about this right here or in the R mailing list, where Ansari's test was proposed then for doing such kind of analyses. So maybe Ansari's test would be a good choice, although so far I have not seen any publication doing variability analyses by using Ansari's test. Another approach which was recommended to me was to not build age groups and compare them to each other (I used two "extreme" age groups, so very young vs. very old samples), but to create a kind of fixed-effect models for analyzing variability with age. Maybe something like this would be the best option as we have all the age information available (in years or even months) and this way we do not loose any information we actually have got. But I am not quite sure about how to model variability. How would one do this? Recently there was also a study published where they say that they used linear models and calculated "methylation deviance" as the squared distance of the residuals of every marker from the population mean, but again I am not sure about it, and the description of the methods part is quite short. Any suggestions about the "best" way to analyze changes in variability of methylation (and gene expression) values? Which strategy would you recommend? Best, Simone [[alternative HTML version deleted]]

Microarray Microarray • 1.2k views

ADD COMMENT • link updated 11.1 years ago by Pekka Kohonen ▴ 190 • written 11.1 years ago by Simone ▴ 190

0

Entering edit mode

Pekka Kohonen ▴ 190

@pekka-kohonen-5862

Last seen 6.3 years ago

Sweden

Hello, I also find this an interesting question. I don't have a solution handy but I would say that using linear models seems preferable because you can discount in the model other sources of variation. For instance batch effects, other confounding variables/covariates (smoking?, bmi?) and so on that are not age-related. Best Regards, Pekka 2013/3/26 Simone <enomis.bioc at="" gmail.com="">: > Hi! > > My question is more of a general style, nevertheless I hope someone can > help. > > I am currently trying to analyze "differential variability" of gene > expression and, above all, methylation data (Illumina microarray data: 27K > and 450K BeadChip) in the context of aging, i.e. I would like to see if the > variability of methylation increases (or decreases) for (healthy) > individuals when they age. I would like to do this gene-wise, to see if and > which genes show increased/decreased variability. > > Several studies already published in this context employ different methods > for such kind of analyses: > > First of all there is the normal F-test. But since it requires data that > does not depart from normality I think it is not applicable in my case. For > one of my datasets (~ 500 samples after outlier removal) I performed > Shapiro-Wilk tests for the ~ 27.000 CpGs and found that more than 26.200 > CpGs do not have normally distributed values (FDR 0.05). I think this is an > usual observation when working with methylation data. > > In other analyses investigating similar questions Bartlett's test was > employed. But it would require normal distributions as well. I also read > something about this right here or in the R mailing list, where Ansari's > test was proposed then for doing such kind of analyses. So maybe Ansari's > test would be a good choice, although so far I have not seen any > publication doing variability analyses by using Ansari's test. > > Another approach which was recommended to me was to not build age groups > and compare them to each other (I used two "extreme" age groups, so very > young vs. very old samples), but to create a kind of fixed-effect models > for analyzing variability with age. Maybe something like this would be the > best option as we have all the age information available (in years or even > months) and this way we do not loose any information we actually have got. > But I am not quite sure about how to model variability. How would one do > this? > > Recently there was also a study published where they say that they used > linear models and calculated "methylation deviance" as the squared distance > of the residuals of every marker from the population mean, but again I am > not sure about it, and the description of the methods part is quite short. > > Any suggestions about the "best" way to analyze changes in variability of > methylation (and gene expression) values? > Which strategy would you recommend? > > Best, > Simone > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 11.1 years ago Pekka Kohonen ▴ 190

0

Entering edit mode

To make the preliminary test on variances is rather like putting to sea in a rowing boat to find out whether conditions are sufficiently calm for an ocean liner to leave port! (Box, 1953) >From an article on mixture models for simultaneously detecting differences in the mean and variance, by Haim Bar and Jim Booth, which is by far the best I've read. I was going to keep this to myself but it appears that there is no need, so, dig out the paper and get reading. It is a good paper with good examples and the code is (was?) available from Haim Bar. I don't know if he has made a package out of it yet. The paper was published in Statistical Applications in Genetics and Molecular Biology, a journal which used to be fantastic (under the Berkeley Electronic Press) but under deGruyter seems to be suffering rather a lot. C'est la vie. On Wed, Mar 27, 2013 at 4:43 AM, Pekka Kohonen <pkpekka@gmail.com> wrote: > Hello, > > I also find this an interesting question. I don't have a solution > handy but I would say that using linear models seems preferable > because you can discount in the model other sources of variation. For > instance batch effects, other confounding variables/covariates > (smoking?, bmi?) and so on that are not age-related. > > Best Regards, Pekka > > 2013/3/26 Simone <enomis.bioc@gmail.com>: > > Hi! > > > > My question is more of a general style, nevertheless I hope someone can > > help. > > > > I am currently trying to analyze "differential variability" of gene > > expression and, above all, methylation data (Illumina microarray data: > 27K > > and 450K BeadChip) in the context of aging, i.e. I would like to see if > the > > variability of methylation increases (or decreases) for (healthy) > > individuals when they age. I would like to do this gene-wise, to see if > and > > which genes show increased/decreased variability. > > > > Several studies already published in this context employ different > methods > > for such kind of analyses: > > > > First of all there is the normal F-test. But since it requires data that > > does not depart from normality I think it is not applicable in my case. > For > > one of my datasets (~ 500 samples after outlier removal) I performed > > Shapiro-Wilk tests for the ~ 27.000 CpGs and found that more than 26.200 > > CpGs do not have normally distributed values (FDR 0.05). I think this is > an > > usual observation when working with methylation data. > > > > In other analyses investigating similar questions Bartlett's test was > > employed. But it would require normal distributions as well. I also read > > something about this right here or in the R mailing list, where Ansari's > > test was proposed then for doing such kind of analyses. So maybe Ansari's > > test would be a good choice, although so far I have not seen any > > publication doing variability analyses by using Ansari's test. > > > > Another approach which was recommended to me was to not build age groups > > and compare them to each other (I used two "extreme" age groups, so very > > young vs. very old samples), but to create a kind of fixed-effect models > > for analyzing variability with age. Maybe something like this would be > the > > best option as we have all the age information available (in years or > even > > months) and this way we do not loose any information we actually have > got. > > But I am not quite sure about how to model variability. How would one do > > this? > > > > Recently there was also a study published where they say that they used > > linear models and calculated "methylation deviance" as the squared > distance > > of the residuals of every marker from the population mean, but again I am > > not sure about it, and the description of the methods part is quite > short. > > > > Any suggestions about the "best" way to analyze changes in variability of > > methylation (and gene expression) values? > > Which strategy would you recommend? > > > > Best, > > Simone > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]

ADD REPLY • link 11.1 years ago Tim Triche ★ 4.2k

Login before adding your answer.