A survey of computational setups for Illumina Infinium DNA methylation EPIC array pipeline
Entering edit mode
Last seen 14 months ago
United States

Main Question

What computational setups do you use for running EPIC array preprocessing pipelines?


  • Where do you run your pipeline?
    • e.g. Personal PC, local network, HPC using scheduler like SLURM, etc.
  • What configuration do you use?
    • Nodes, cores, CPU, memory per CPU (or equivalent)
  • What kinds of projects do you work on:
    • Number of samples, common tasks, significant bottlenecks worth mentioning.

Some background

I'm preprocessing EPIC array data on 1800 samples using RStudio. I'm running the pipeline in Amazon Cloud Computing (AWS) through a service called Ronin. This allows me to customize my machine by processor, number of virtual CPUs, and the total memory available.

Nevertheless, even with a large compute optimized machine (32 CPUs with 256GiB memory), loading the .idats, creating PC scores, normalizing, etc. is still very slow. Part of this seems to be because some packages (e.g. minfi) are not taking advantage of parallelization, so some steps are only drawing on one CPU. I feel like there has to be a better way.

I'm looking to improve my pipeline times and avoid crashing/hangups. I thought this community might have thoughts on this. I've looked into some of the "big" versions of common packages bigmelon, meffil but don't know anybody who has used these.

DNAMethylation wateRmelon Computation minfi • 382 views

Login before adding your answer.

Traffic: 883 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6