Entering edit mode
Hi everybody.
Not so long ago, I asked in this list about some normalization issues.
The question and its very interesting replies, from which I have
learned a lot, can be found here:
http://comments.gmane.org/gmane.science.biology.informatics.conductor/
41812
It seems to me that, the more I am getting into Bioinformatics, the
less I know about everything. I usually doubt about everything, and I
am always asking, step by step, if I am doing things correctly.
Now, I want to test some ideas on a 450K methylation array data base.
Main idea is to try to classify probesets in families according to
their behavior with respect to some phenotype variables. I have
several ideas I would like to try on this data, and the first step has
been to import, review, visualize and try to understand the global
structure of the beta values I have at my hand.
Once loaded, I have made two box plots. One shows the distribution of
beta values among the 40 samples, and the other shows the distribution
among the first 100 probesets.
I have shared the plots at my Google Docs account:
https://docs.google.com/open?id=0Bw-_OWjrT9U4cTlZblR0UkVhWG8
https://docs.google.com/open?id=0Bw-_OWjrT9U4d3FTZTQtNWJFUVE
My question might sound stupid, but I want to deeply understand what
is going on with these plots.
For the beta vs. probeset:
- I guess the variability is normal. Some probes are methylated most
of the time, some not, and there are a lot of differences in their
behavior. This is the common behavior, isn't it?
- Boxplot might not be the best solution here, because the
distribution need not to be unimodal, I think. Am I right?
- My intuition is that these values should be normalized in case we
were going to use something like SVM-RFE to do probeset selection.
Again, is my intuition right?
For the beta vs. sample:
- Data distribution seems more regular than in the other plot. Is that
an effect of the underlying normalization that GenomeStudio does? Or
is the way beta values across samples are supposed to behave?
- Although they seem regular, there are still small differences among
medians, which made me think. Would a normalization of this data
benefit following experiments?
In general, I would like to know if the plots show a normal behavior,
if it is the expected one, or if I should kind of normalize them using
a predefined or standard method.
Any help or hint will be extremely appreciated.
Regards,
Gustavo
---------------------------
Enviado con Sparrow (http://www.sparrowmailapp.com/?sig)