Analyzing "differential variability" of methylation (and gene expression)

0

Entering edit mode

Simone ▴ 190

@simone-5854

Last seen 6.5 years ago

Hello! First of all, I'm very sorry I could not reply earlier, and thank you very much for answering my question on how to analyze differences in the variability of methylation and expression values with aging. > From: Pekka Kohonen > I also find this an interesting question. I don't have a solution > handy but I would say that using linear models seems preferable > because you can discount in the model other sources of variation. For > instance batch effects, other confounding variables/covariates > (smoking?, bmi?) and so on that are not age-related. Yes, this is also what we were thinking about. But I am not sure about how to model variability (gene-wise) in such an approach. Whenever I would like to add any variability measure for a gene in a simple linear model, I would have to build groups of ages previously (if not, I don't know where increased or decreased variability with increasing age should come from, but maybe I am missing something, my experience with building such models is very limited so far), but this (building of groups) is exactly what was suggested to avoid. > From: Tim Triche, Jr. > From an article on mixture models for simultaneously detecting differences > in the mean and variance, by Haim Bar and Jim Booth, which is by far the > best I've read. Thank you very much for mentioning this very interesting paper. I don't know why I haven't found it before, I should have, because it deals (almost) exactly with the problem I want to solve. And it sounds really good! The only thing is, that for the method the author describes in his paper, I would also need to build age groups to be able to then compare the differences between the two groups. But as I already wrote above, I'm not sure if it would really be easily possible to work without age groups anyway. As I was very interested in the approach of the paper and could not find the corresponding code neither an R package, I contacted Haim Bar via e-mail. He told me that he could provide the code and that he's currently generalizing the model to handle more groups and also covariates (including continuous variables), which is what I was looking for. So probably this will be the way to go for me. However, I think I'll also have to come back to discuss the issue with those who told me not to use age groups for my analysis, to get things clearer. Thank you for your help. Best, Simone [[alternative HTML version deleted]]

GO GO • 1.6k views

ADD COMMENT • link updated 11.6 years ago by Kasper Daniel Hansen ★ 6.5k • written 11.6 years ago by Simone ▴ 190

0

Entering edit mode

Kasper Daniel Hansen ★ 6.5k

@kasper-daniel-hansen-2979

Last seen 18 months ago

United States

For aging and methylation be very aware of the need to correct for cell type composition, if the measurements are on blood. Kasper On Fri, May 3, 2013 at 9:20 AM, Simone <enomis.bioc@gmail.com> wrote: > Hello! > > First of all, I'm very sorry I could not reply earlier, and thank you very > much for answering my question on how to analyze differences in the > variability of methylation and expression values with aging. > > > > From: Pekka Kohonen > > > I also find this an interesting question. I don't have a solution > > handy but I would say that using linear models seems preferable > > because you can discount in the model other sources of variation. For > > instance batch effects, other confounding variables/covariates > > (smoking?, bmi?) and so on that are not age-related. > > Yes, this is also what we were thinking about. But I am not sure about how > to model variability (gene-wise) in such an approach. Whenever I would like > to add any variability measure for a gene in a simple linear model, I would > have to build groups of ages previously (if not, I don't know where > increased or decreased variability with increasing age should come from, > but maybe I am missing something, my experience with building such models > is very limited so far), but this (building of groups) is exactly what was > suggested to avoid. > > > > From: Tim Triche, Jr. > > > From an article on mixture models for simultaneously detecting > differences > > in the mean and variance, by Haim Bar and Jim Booth, which is by far the > > best I've read. > > Thank you very much for mentioning this very interesting paper. I don't > know why I haven't found it before, I should have, because it deals > (almost) exactly with the problem I want to solve. And it sounds really > good! The only thing is, that for the method the author describes in his > paper, I would also need to build age groups to be able to then compare the > differences between the two groups. But as I already wrote above, I'm not > sure if it would really be easily possible to work without age groups > anyway. > > As I was very interested in the approach of the paper and could not find > the corresponding code neither an R package, I contacted Haim Bar via > e-mail. He told me that he could provide the code and that he's currently > generalizing the model to handle more groups and also covariates (including > continuous variables), which is what I was looking for. So probably this > will be the way to go for me. > > However, I think I'll also have to come back to discuss the issue with > those who told me not to use age groups for my analysis, to get things > clearer. > > Thank you for your help. > > Best, > Simone > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 11.6 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

Dear Kasper, thank you very much for your hint. Indeed, we are (in some cases) working on whole blood measurements, and we are aware of this problem, but in those cases I fear we don't have any possibility to solve it. Furthermore, it is very difficult to get datasets with large sample sizes where cell sorting was performed, at least by now (and also because you'd need plenty of blood to be able to purify for certain cell types, I think). Although this is slightly off-topic: As you write about "correct for cell type composition", do you know of a good way to do so? Or asking differently, do you know any good paper estimating the subpopulation changes during the whole aging process (in blood)? I saw plenty of papers dealing with pediatrics, but there seems not to be much available spanning the whole range of ages in humans (although the biggest and fastest changes however seem to occur during the first years of life). Simone On Fri, May 3, 2013 at 3:58 PM, Kasper Daniel Hansen <kasperdanielhansen at="" gmail.com=""> wrote: > For aging and methylation be very aware of the need to correct for cell type > composition, if the measurements are on blood. > > Kasper > > > On Fri, May 3, 2013 at 9:20 AM, Simone <enomis.bioc at="" gmail.com=""> wrote: >> >> Hello! >> >> First of all, I'm very sorry I could not reply earlier, and thank you very >> much for answering my question on how to analyze differences in the >> variability of methylation and expression values with aging. >> >> >> > From: Pekka Kohonen >> >> > I also find this an interesting question. I don't have a solution >> > handy but I would say that using linear models seems preferable >> > because you can discount in the model other sources of variation. For >> > instance batch effects, other confounding variables/covariates >> > (smoking?, bmi?) and so on that are not age-related. >> >> Yes, this is also what we were thinking about. But I am not sure about how >> to model variability (gene-wise) in such an approach. Whenever I would >> like >> to add any variability measure for a gene in a simple linear model, I >> would >> have to build groups of ages previously (if not, I don't know where >> increased or decreased variability with increasing age should come from, >> but maybe I am missing something, my experience with building such models >> is very limited so far), but this (building of groups) is exactly what was >> suggested to avoid. >> >> >> > From: Tim Triche, Jr. >> >> > From an article on mixture models for simultaneously detecting >> differences >> > in the mean and variance, by Haim Bar and Jim Booth, which is by far the >> > best I've read. >> >> Thank you very much for mentioning this very interesting paper. I don't >> know why I haven't found it before, I should have, because it deals >> (almost) exactly with the problem I want to solve. And it sounds really >> good! The only thing is, that for the method the author describes in his >> paper, I would also need to build age groups to be able to then compare >> the >> differences between the two groups. But as I already wrote above, I'm not >> sure if it would really be easily possible to work without age groups >> anyway. >> >> As I was very interested in the approach of the paper and could not find >> the corresponding code neither an R package, I contacted Haim Bar via >> e-mail. He told me that he could provide the code and that he's currently >> generalizing the model to handle more groups and also covariates >> (including >> continuous variables), which is what I was looking for. So probably this >> will be the way to go for me. >> >> However, I think I'll also have to come back to discuss the issue with >> those who told me not to use age groups for my analysis, to get things >> clearer. >> >> Thank you for your help. >> >> Best, >> Simone >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 11.6 years ago Simone ▴ 190

0

Entering edit mode

Read Houseman's paper and use the sorted cells from Juha Kere's lab to calibrate. If you have expression data for your samples you can further decrease the variance of your estimates. Lymphoid bias in younger subjects gradually transitions to a myeloid bias in the elderly. Or not-so-gradually depending on what they've been exposed to. This itself correlates with an overall decrease in methylation but with focal hypermethylation and deterioration of the local correlation structure in older (especially mitotically older) populations. Anyways. This is of a piece with sensitive tests for differential variability. --t On May 3, 2013, at 8:09 AM, Simone <enomis.bioc at="" gmail.com=""> wrote: > Dear Kasper, > > thank you very much for your hint. Indeed, we are (in some cases) > working on whole blood measurements, and we are aware of this problem, > but in those cases I fear we don't have any possibility to solve it. > Furthermore, it is very difficult to get datasets with large sample > sizes where cell sorting was performed, at least by now (and also > because you'd need plenty of blood to be able to purify for certain > cell types, I think). > > Although this is slightly off-topic: As you write about "correct for > cell type composition", do you know of a good way to do so? Or asking > differently, do you know any good paper estimating the subpopulation > changes during the whole aging process (in blood)? I saw plenty of > papers dealing with pediatrics, but there seems not to be much > available spanning the whole range of ages in humans (although the > biggest and fastest changes however seem to occur during the first > years of life). > > Simone > > On Fri, May 3, 2013 at 3:58 PM, Kasper Daniel Hansen > <kasperdanielhansen at="" gmail.com=""> wrote: >> For aging and methylation be very aware of the need to correct for cell type >> composition, if the measurements are on blood. >> >> Kasper >> >> >> On Fri, May 3, 2013 at 9:20 AM, Simone <enomis.bioc at="" gmail.com=""> wrote: >>> >>> Hello! >>> >>> First of all, I'm very sorry I could not reply earlier, and thank you very >>> much for answering my question on how to analyze differences in the >>> variability of methylation and expression values with aging. >>> >>> >>>> From: Pekka Kohonen >>> >>>> I also find this an interesting question. I don't have a solution >>>> handy but I would say that using linear models seems preferable >>>> because you can discount in the model other sources of variation. For >>>> instance batch effects, other confounding variables/covariates >>>> (smoking?, bmi?) and so on that are not age-related. >>> >>> Yes, this is also what we were thinking about. But I am not sure about how >>> to model variability (gene-wise) in such an approach. Whenever I would >>> like >>> to add any variability measure for a gene in a simple linear model, I >>> would >>> have to build groups of ages previously (if not, I don't know where >>> increased or decreased variability with increasing age should come from, >>> but maybe I am missing something, my experience with building such models >>> is very limited so far), but this (building of groups) is exactly what was >>> suggested to avoid. >>> >>> >>>> From: Tim Triche, Jr. >>> >>>> From an article on mixture models for simultaneously detecting >>> differences >>>> in the mean and variance, by Haim Bar and Jim Booth, which is by far the >>>> best I've read. >>> >>> Thank you very much for mentioning this very interesting paper. I don't >>> know why I haven't found it before, I should have, because it deals >>> (almost) exactly with the problem I want to solve. And it sounds really >>> good! The only thing is, that for the method the author describes in his >>> paper, I would also need to build age groups to be able to then compare >>> the >>> differences between the two groups. But as I already wrote above, I'm not >>> sure if it would really be easily possible to work without age groups >>> anyway. >>> >>> As I was very interested in the approach of the paper and could not find >>> the corresponding code neither an R package, I contacted Haim Bar via >>> e-mail. He told me that he could provide the code and that he's currently >>> generalizing the model to handle more groups and also covariates >>> (including >>> continuous variables), which is what I was looking for. So probably this >>> will be the way to go for me. >>> >>> However, I think I'll also have to come back to discuss the issue with >>> those who told me not to use age groups for my analysis, to get things >>> clearer. >>> >>> Thank you for your help. >>> >>> Best, >>> Simone >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>

ADD REPLY • link 11.6 years ago Tim Triche ★ 4.2k

0

Entering edit mode

> Read Houseman's paper and use the sorted cells from Juha Kere's lab to calibrate. Thank you very much for the recommendations. I read both papers and as far as I can see I could use the method of Houseman et al. with the signature provided (which they say is not not affected by age) to estimate proportions of celltypes for every sample I have got, and then add these values as covariates to my model, to see if cell type distribution changes have an effect. And furthermore, for my 450K dataset I think I could apply Houseman's method on the purified cell data of Reinius/Kere to obtain such a signature for the 450K platform as well, and do the same again. Right? What I found interesting is that Reinius et al. write in their paper: "Methylation in the promoter CpG islands tends to be low and very similar among all the cell types and for those CpG sites, measurements in whole blood would reflect the methylation status across cell populations". Wouldn't this mean that for data obtained by the 27K microarray, which has its probes located in CpG islands of gene promoters, there would not be such a "subpopulation change effect" counfounding methylation measures of whole blood? However, I will try to see what happens in my data (both 27K and 450K). Reinius also says, that "the differential cell count in whole blood was similar for all six donors", maybe because they did not cover such wide age ranges (24 yrs - 52 yrs while I have got data from newborns to ~ 100 yrs old). Although data is only for seven or eight main leukocytic cell types available, I think it will be very good (and important) to see what happens when adjusting (at least) for those subtypes. I always find it odd when new papers coming out in the context of methylation and aging say that their observations are not due to blood composition changes although they look at whole blood samples referring to the paper of Rakyan et al. from 2010 where they sorted CD14+ monocytes and CD4+ T cells and concluded that there is a significant correlation between the two when looking at hypermethylated regions but _not_ for hypomethylation, when the change predominantly ocurring in blood with age is _hypo_methylation! The conclusion of Rakyan's paper actually was that hypomethylated regions "probably reflect aging-associated changes in the relativ proportion of cell subtypes in whole blood", and not the contrary ... Now that I am aware of Houseman's method and the data, I can at least try to do a little better. Thanks a lot for your help! Simone

ADD REPLY • link 11.6 years ago Simone ▴ 190

0

Entering edit mode

This was for charm data: http://biostatistics.oxfordjournals.org/content/13/1/166 but could be a starting point after adjusting for cell-types? As implemented (http://biostat.jhsph.edu/~ajaffe/code/vmrFinder.R) , it only allows for dichotomous covariate. On Fri, May 10, 2013 at 8:59 AM, Simone <enomis.bioc at="" gmail.com=""> wrote: >> Read Houseman's paper and use the sorted cells from Juha Kere's lab to calibrate. > > Thank you very much for the recommendations. I read both papers and as > far as I can see I could use the method of Houseman et al. with the > signature provided (which they say is not not affected by age) to > estimate proportions of celltypes for every sample I have got, and > then add these values as covariates to my model, to see if cell type > distribution changes have an effect. > > And furthermore, for my 450K dataset I think I could apply Houseman's > method on the purified cell data of Reinius/Kere to obtain such a > signature for the 450K platform as well, and do the same again. > > Right? > > What I found interesting is that Reinius et al. write in their paper: > "Methylation in the promoter CpG islands tends to be low and very > similar among all the cell types and for those CpG sites, measurements > in whole blood would reflect the methylation status across cell > populations". Wouldn't this mean that for data obtained by the 27K > microarray, which has its probes located in CpG islands of gene > promoters, there would not be such a "subpopulation change effect" > counfounding methylation measures of whole blood? > > However, I will try to see what happens in my data (both 27K and 450K). > Reinius also says, that "the differential cell count in whole blood > was similar for all six donors", maybe because they did not cover such > wide age ranges (24 yrs - 52 yrs while I have got data from newborns > to ~ 100 yrs old). > > Although data is only for seven or eight main leukocytic cell types > available, I think it will be very good (and important) to see what > happens when adjusting (at least) for those subtypes. I always find it > odd when new papers coming out in the context of methylation and aging > say that their observations are not due to blood composition changes > although they look at whole blood samples referring to the paper of > Rakyan et al. from 2010 where they sorted CD14+ monocytes and CD4+ T > cells and concluded that there is a significant correlation between > the two when looking at hypermethylated regions but _not_ for > hypomethylation, when the change predominantly ocurring in blood with > age is _hypo_methylation! The conclusion of Rakyan's paper actually > was that hypomethylated regions "probably reflect aging-associated > changes in the relativ proportion of cell subtypes in whole blood", > and not the contrary ... > > Now that I am aware of Houseman's method and the data, I can at least > try to do a little better. Thanks a lot for your help! > > Simone > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 11.6 years ago Brent Pedersen ▴ 110

0

Entering edit mode

> This was for charm data: > http://biostatistics.oxfordjournals.org/content/13/1/166 > but could be a starting point after adjusting for cell-types? As > implemented (http://biostat.jhsph.edu/~ajaffe/code/vmrFinder.R) , it > only allows for dichotomous covariate. Yes, I know this (very nice) paper, thanks! But as we don't have Charm data, and furthermore I was then told not to build age-groups, I opened this discussion. Simone

ADD REPLY • link 11.6 years ago Simone ▴ 190

Login before adding your answer.