Entering edit mode
Hey Jeremy,
There are a couple things you can do. First off, how bad are the
plots? Feel free to send them to me. In practice, the plots actually
have to be REALLY bad, such as highly skewed or bi-modal to really
make a difference. If your mean plots are fairly symmetric, using the
parametric adjustment is probably fine.
For parallelization, the easiest thing to do--because your 450K
dataset is so large--is to randomly chop your dataset into 10 segments
and then running them separately. This might do the trick and get you
down to hours or days, and likely won't impact the results much (and
plus you can check it by running it a couple times). Finally, the non-
parametric iterations likely can be implemented in an "apply" function
in R (and then in parallel using the 'snow' package or something)
instead of the loop, but it would be fairly involved (lots of
manipulation of matrices and indexing), which is why I haven't already
done it.
Anyway, hope this helps!
Evan
On Sep 10, 2013, at 10:43 AM, Jeremy Rosenblum wrote:
I am trying to normalize between 2 Illumina 450K methylation arrays
(~480000 probes/sample for 16 samples). The density plots
(prior.plots=T) of the 2 arrays is such that I need to use ComBat in
the nonparametric mode (red and black do not overlay well), which as
expected took a very long time (1 month). Is there a way to
parallelize ComBat so that it can run faster?
Thanks
Jeremy Rosenblum, MD
Assistant Professor of Pediatrics
Pediatric Hematology/Oncology
Children's Hospital at Montefiore
Office: (718)-741-2342
Lab: (718)-678-1162
Fax: (718)-920-6506
[[alternative HTML version deleted]]