Entering edit mode
Hi everybody,
I have a question related to the analysis of methylation microarray
data. At first, I asked it on Biostar (here
http://www.biostars.org/p/64405/#64521) and somebody there suggested
me
to put it also here. The question is:
"[..] I am currently working on a DNA methylation microarray analysis
project. I have 20 samples measured on a Illumina 450k. After some
initial preprocessing and non-specific filtering, I lowered its
dimensionality down to 47k probes. Using minfi, I adjust a linear
regression model to each probe taking the sample age as the only
continuous predictor and trying to estimate the methylation level (in
the form of M-values, logit transformations of the beta values).
P-values are then adjusted using FDR, and I keep the significant
probes
as the final subset of differentially methylated probes.
Now, we want to divide these probes in several groups, according to
their variability trend. This is, we want to be able to detect if, for
a
given probe, the methylation values are convergent or divergent with
respect to age. At first I was thinking about using the White test to
see if the squared residuals behave as stated before, or something
equivalent for heteroskedasticity testing. But then I thought that if
the squared residuals behave in a non-normal way, it could be due to
several other factors, such as outliers or influence points. Am I
right
untrusting this approximation or the White test could fit in this
context?
A fellow told me another possible way would be to use Mixed Models
with
a variance function. That way I could model not only the change in
methylation level but also the change in variabilty. If I choose this
way, then I should define some age groups and partition the samples
among them, shouldn't I? Is this a better approximation in this case
than the basic linear regression? [..]"
ADDITIONAL NOTES:
I really like the mixed model approach, and I have managed to play a
little bit with the nlme package and varFunc class family in order to
study the heteroskedasticity, but I still think I am missing
something.
I have also being reading excerpts from the "Mixed Effects Models in S
and S/Plus" book by Pinheiro and Bates, and I think I can understand
the
examples, but then I find it hard to adapt the examples to the
methylation scenario.
For example, say I have the methylation values for one probe. Obvious
simple linear model is "meth ~ age". So far, so good. But, if I want
to
convert it to a mixed model, which covariate can be declared as a
random
effect? I have been playing with the age as a random factor, but I am
not sure if that is a good model. In the end, what I want is to be
able
to use lme() and pass it a varFunc in order to see if it can adjust a
model for the variability trend.
If this cannot be modeled as a mixed model, is there any tool to fit a
linear model with a variance function, just as the lme() function
does?
Any help will be much appreciated.
Regards,
Gus