Copy Number Analysis for Mapreduce
1
0
Entering edit mode
@mcoyneboninccom-3525
Last seen 10.3 years ago
I'm in search of copy number analysis implementation that would fit for Hadoop/Mapreduce paradigm; I appreciate if anyone has used/experienced with copy number analysis that can be used with Hadoop/Mapreduce and point me to those. Hadoop is a software framework on Linux that allows for large scale distributed data analysis. Hadoop uses MapReduce paradigm to implement its fault tolerant distributed computing system over large datasets on cluster's distributed file system. In Mapreduce paradigm there are separate Map and Reduce steps, each step is done in parallel; hence program execution is divided into a Map and a Reduce stage. For such reason, I am looking for Copy Number Analyssis Algorithm fits into the MapReduce paradigm. Thanks My Coyne
• 740 views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 22 months ago
United States
Hi, On Sat, Mar 17, 2012 at 12:54 PM, My Coyne <mcoyne at="" boninc.com=""> wrote: > > I'm in search of copy number analysis implementation that would fit for Hadoop/Mapreduce paradigm; I appreciate if anyone has used/experienced with copy number analysis that can be > used with Hadoop/Mapreduce and point me to those. > > Hadoop is a software framework on Linux that allows for large scale distributed data analysis. Hadoop uses MapReduce paradigm to implement its fault tolerant distributed computing > system over large datasets on cluster's distributed file system. ?In Mapreduce paradigm there are separate Map and Reduce steps, each step is done in parallel; hence program execution is > divided into a Map and a Reduce stage. ?For such reason, I am looking for Copy Number Analyssis Algorithm fits into the MapReduce paradigm. You might want to start looking at the GATK: http://www.broadinstitute.org/gsa/wiki/index.php/The_Genome_Analysis_T oolkit I'm not sure if it has exactly what you want, but it could be a good place to start as a foundation/toolbox if you're looking to build such a thing. From their website: """ The Genome Analysis Toolkit (GATK) is a structured programming framework designed to enable rapid development of efficient and robust analysis tools for next-generation DNA sequencers. The GATK solves the data management challenge by separating data access patterns from analysis algorithms, using the functional programming philosophy of Map/Reduce """ HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD COMMENT

Login before adding your answer.

Traffic: 704 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6