Entering edit mode
Hi everybody.
Imagine the following scenario: I have a Methylation data
ExpressionSet with 40 samples and 450K probes (Illumina kind). Samples
are divided in two classes, and I would like to characterize families
of probes according to their behavior. That is, I would like to find a
set of probes hypermethylating with respect to the covariate that
divides between classes, another one showing that variability
increases between classes, etc.
I have been trying some ideas around the following workflow:
1) Filtering of the data (non-specific, sexual chromosome genes, ..)
2) Transformation into a lower-dimensional, summary, subspace. For
example, if I have 20 beta values for a class, and 20 for the other,
above transformation takes the 40-dimensional beta values vector and
summarizes it as a 2 dimensional vector, with the first component
being the difference of the medians of the two classes, and the second
one being the difference in their IQR. My idea was to summarize data
and work with those transformed variables that really characterize
what I am looking for.
3) Clustering in the new subspace. For now, I am using k-means as a
baseline clustering
method. My idea was to test a hierarchical method and maybe a Bayesian
dp-means, among others.
This is mainly a exploratory workflow. I want to know how these probes
behave according to the above variables, and I am testing different
ideas on my data. But I was wondering if I am doing right by
summarizing the beta values into the new variables, or if there is
some alternative (maybe model-based) for doing this kind of
exploratory work. Apart from losing a lot of information on the way,
am I getting into problems for doing that?
Any hint or suggestion will be appreciated.
Regards,
Gus
---------------------------
Enviado con Sparrow (http://www.sparrowmailapp.com/?sig)