Running ComBat in parallel
2
0
Entering edit mode
@w-evan-johnson-5447
Last seen 9 days ago
United States
Hey Jeremy, There are a couple things you can do. First off, how bad are the plots? Feel free to send them to me. In practice, the plots actually have to be REALLY bad, such as highly skewed or bi-modal to really make a difference. If your mean plots are fairly symmetric, using the parametric adjustment is probably fine. For parallelization, the easiest thing to do--because your 450K dataset is so large--is to randomly chop your dataset into 10 segments and then running them separately. This might do the trick and get you down to hours or days, and likely won't impact the results much (and plus you can check it by running it a couple times). Finally, the non- parametric iterations likely can be implemented in an "apply" function in R (and then in parallel using the 'snow' package or something) instead of the loop, but it would be fairly involved (lots of manipulation of matrices and indexing), which is why I haven't already done it. Anyway, hope this helps! Evan On Sep 10, 2013, at 10:43 AM, Jeremy Rosenblum wrote: I am trying to normalize between 2 Illumina 450K methylation arrays (~480000 probes/sample for 16 samples). The density plots (prior.plots=T) of the 2 arrays is such that I need to use ComBat in the nonparametric mode (red and black do not overlay well), which as expected took a very long time (1 month). Is there a way to parallelize ComBat so that it can run faster? Thanks Jeremy Rosenblum, MD Assistant Professor of Pediatrics Pediatric Hematology/Oncology Children's Hospital at Montefiore Office: (718)-741-2342 Lab: (718)-678-1162 Fax: (718)-920-6506 [[alternative HTML version deleted]]
• 1.5k views
ADD COMMENT
0
Entering edit mode
@jeremy-rosenblum-6141
Last seen 9.7 years ago
I am trying to normalize between 2 Illumina 450K methylation arrays (~480000 probes/sample for 16 samples). The density plots (prior.plots=T) of the 2 arrays is such that I need to use ComBat in the nonparametric mode (red and black do not overlay well), which as expected took a very long time (1 month). Is there a way to parallelize ComBat so that it can run faster? Thanks Jeremy Rosenblum, MD Assistant Professor of Pediatrics Pediatric Hematology/Oncology Children's Hospital at Montefiore Office: (718)-741-2342 Lab: (718)-678-1162 Fax: (718)-920-6506 [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
@w-evan-johnson-5447
Last seen 9 days ago
United States
Hey Jeremy, Your plots look fine. I'd actually suggest that you use the parametric ComBat on your data. However, you are correct in regard to your 'random rows' description. That is exactly what I meant previously. Evan On Sep 10, 2013, at 11:42 PM, Jeremy Rosenblum wrote: Hey Evan, Thanks for replying! I have attached the plots for your review. I wasn't totally clear on exactly how ComBat worked to be sure if I could divide it into smaller pieces (too much time in medical school and hospital and not enough advanced statistics!). If I understand you correctly, I should take random rows and put them together into smaller matrices, and run it that way. If you agree that I need to run the non-parametric adjustment, I will definitely give that a try. Thanks for your help and developing ComBat in the first place. JR Jeremy Rosenblum, MD Assistant Professor of Pediatrics Pediatric Hematology/Oncology Children's Hospital at Montefiore Office: (718)-741-2342 Lab: (718)-678-1162 Fax: (718)-920-6506 ________________________________ From: Johnson, William Evan [wej@bu.edu<mailto:wej@bu.edu>] Sent: Tuesday, September 10, 2013 12:30 PM To: Jeremy Rosenblum Cc: bioconductor@r-project.org<mailto:bioconductor@r-project.org> Subject: Re: Running ComBat in parallel Hey Jeremy, There are a couple things you can do. First off, how bad are the plots? Feel free to send them to me. In practice, the plots actually have to be REALLY bad, such as highly skewed or bi-modal to really make a difference. If your mean plots are fairly symmetric, using the parametric adjustment is probably fine. For parallelization, the easiest thing to do--because your 450K dataset is so large--is to randomly chop your dataset into 10 segments and then running them separately. This might do the trick and get you down to hours or days, and likely won't impact the results much (and plus you can check it by running it a couple times). Finally, the non- parametric iterations likely can be implemented in an "apply" function in R (and then in parallel using the 'snow' package or something) instead of the loop, but it would be fairly involved (lots of manipulation of matrices and indexing), which is why I haven't already done it. Anyway, hope this helps! Evan On Sep 10, 2013, at 10:43 AM, Jeremy Rosenblum wrote: I am trying to normalize between 2 Illumina 450K methylation arrays (~480000 probes/sample for 16 samples). The density plots (prior.plots=T) of the 2 arrays is such that I need to use ComBat in the nonparametric mode (red and black do not overlay well), which as expected took a very long time (1 month). Is there a way to parallelize ComBat so that it can run faster? Thanks Jeremy Rosenblum, MD Assistant Professor of Pediatrics Pediatric Hematology/Oncology Children's Hospital at Montefiore Office: (718)-741-2342 Lab: (718)-678-1162 Fax: (718)-920-6506 <combat_plot.pdf> [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 418 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6