Bioconductor Cloud AMI

0

Entering edit mode

Rohmatul Fajriyah ▴ 190

@rohmatul-fajriyah-5675

Last seen 11.4 years ago

Dear All, I have a simulation program in R, which I run it in my laptop. To produce 1 result, it took about 7 hours and I need about 50, 100, and 500 results. (I use Mac OS, Macbook Pro. Hardware specs are 2.9 GHz Dual-Core Intel Core i7 Processor and RAM: 8GB.) Then I tried to learn and use Cloud Computing, which was ended up with the Bioconductor Cloud AMI. I took the High-Memory Quadruple Extra Large Instance, API name: m2.4xlarge. With the hope that it will make it faster in producing the results. What happened was the amazon cloud computing produced the result longer than my laptop. It took about 8 hours for 1 result. I am really new to the cloud computing. Could you help me to find out, what my mistake was in using/setting the cloud computing, therefore, the result was produce longer than my laptop? What should I do to make it faster, please? For your attention and kindness, thank you very much in advance. Happy New Year 2013. With kind regards, R Fajriyah [[alternative HTML version deleted]]

• 1.4k views

ADD COMMENT • link updated 13.1 years ago by Martin Morgan 25k • written 13.1 years ago by Rohmatul Fajriyah ▴ 190

0

Entering edit mode

Vincent J. Carey, Jr. 6.7k

@vincent-j-carey-jr-4

Last seen 6 days ago

United States

On Tue, Jan 1, 2013 at 5:59 AM, Rohmatul Fajriyah <rfajriyah@yahoo.com>wrote: > > > Dear All, > > I have a simulation program in R, which I run it in my laptop. > To produce 1 result, it took about 7 hours and I need about 50, 100, and > 500 results. > (I use Mac OS, Macbook Pro. Hardware specs are 2.9 GHz Dual-Core Intel > Core i7 Processor and RAM: 8GB.) > > Then I tried to learn and use Cloud Computing, which was ended up with the > Bioconductor Cloud AMI. > > I took the High-Memory Quadruple Extra Large Instance, API name: > m2.4xlarge. > With the hope that it will make it faster in producing the results. > > The definition of the m2.4large instance is 68.4 GiB of memory 26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each) 1690 GB of instance storage 64-bit platform I/O Performance: High EBS-Optimized Available: 1000 Mbps API name: m2.4xlarge You need to write the simulation code to take advantage of the multiplicity of nodes/cores, the quantity of memory, and the quantity of disk. If your tasks are not explicitly matched to the capabilities of this system, resources will sit idle at your expense, and the program can be expected to run in about the same time as it takes on your laptop. I would suggest that you try to identify a piece of your simulation that completes in a modest period of time, say two minutes. Then find some elements of the computations that can be executed simultaneously. Rewrite these using functions in the parallel package (or look at the BiocParallel package, now available only in the devel branch of Bioconductor) and demonstrate using your multicore laptop that you have achieved some speedup by doing this. Once you have accomplished this, you can consider whether you can benefit from further redesign and then deployment on the AMI instance type that provides resources that will actually contribute to faster completion of your task. The vignette "Using the foreach package" in the foreach package from CRAN has nice examples of approaches to reformulation of iterative programs to use concurrent computing facilities. The CRAN task view has many relevant items: http://cran.r-project.org/web/views/HighPerformanceComputing.html http://en.wikipedia.org/wiki/Amdahl's_law is very relevant > What happened was the amazon cloud computing produced the result longer > than my laptop. > It took about 8 hours for 1 result. > > I am really new to the cloud computing. > Could you help me to find out, what my mistake was in using/setting the > cloud computing, therefore, the result was produce longer than my laptop? > What should I do to make it faster, please? > > For your attention and kindness, thank you very much in advance. > Happy New Year 2013. > > With kind regards, > R Fajriyah > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 13.1 years ago Vincent J. Carey, Jr. 6.7k

0

Entering edit mode

Dear Sir, Thank you very much for your respond. I'll read and digest it. Hope, I can find the solution for my simulation. Thank you. With kind regards, R Fajriyah (I already became a member of this mailing list, since today) ________________________________ From: Vincent Carey <stvjc@channing.harvard.edu> Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> Sent: Tuesday, January 1, 2013 4:09 PM Subject: Re: [BioC] Bioconductor Cloud AMI te: > >Dear All, > >I have a simulation program in R, which I run it in my laptop. >To produce 1 result, it took about 7 hours and I need about 50, 100, and 500 results. >(I use Mac OS, Macbook Pro. Hardware specs are 2.9 GHz Dual-Core Intel Core i7 Processor and RAM: 8GB.) > >Then I tried to learn and use Cloud Computing, which was ended up with the Bioconductor Cloud AMI. > >I took the High-Memory Quadruple Extra Large Instance, API name: m2.4xlarge. >With the hope that it will make it faster in producing the results. > > The definition of the m2.4large instance is 68.4 GiB of memory 26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each) 1690 GB of instance storage 64-bit platform I/O Performance: High EBS-Optimized Available: 1000 Mbps API name: m2.4xlarge You need to write the simulation code to take advantage of the multiplicity of nodes/cores, the quantity of memory, and the quantity of disk. If your tasks are not explicitly matched to the capabilities of this system, resources will sit idle at your expense, and the program can be expected to run in about the same time as it takes on your laptop. I would suggest that you try to identify a piece of your simulation that completes in a modest period of time, say two minutes. Then find some elements of the computations that can be executed simultaneously. Rewrite these using functions in the parallel package (or look at the BiocParallel package, now available only in the devel branch of Bioconductor) and demonstrate using your multicore laptop that you have achieved some speedup by doing this. Once you have accomplished this, you can consider whether you can benefit from further redesign and then deployment on the AMI instance type that provides resources that will actually contribute to faster completion of your task. The vignette "Using the foreach package" in the foreach package from CRAN has nice examples of approaches to reformulation of iterative programs to use concurrent computing facilities. The CRAN task view has many relevant items: http://cran.r-project.org/web/views/HighPerformanceComputing.html http://en.wikipedia.org/wiki/Amdahl's_law is very relevant What happened was the amazon cloud computing produced the result longer than my laptop. >It took about 8 hours for 1 result. > >I am really new to the cloud computing. >Could you help me to find out, what my mistake was in using/setting the cloud computing, therefore, the result was produce longer than my laptop? >What should I do to make it faster, please? > >For your attention and kindness, thank you very much in advance. >Happy New Year 2013. > >With kind regards, >R Fajriyah > [[alternative HTML version deleted]] > > >_______________________________________________ >Bioconductor mailing list >Bioconductor@r-project.org >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 13.1 years ago Rohmatul Fajriyah ▴ 190

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 19 days ago

United States

On 01/01/2013 02:59 AM, Rohmatul Fajriyah wrote: > > > Dear All, > > I have a simulation program in R, which I run it in my laptop. > To produce 1 result, it took about 7 hours and I need about 50, 100, and 500 results. It's worth asking whether the amount of time to do a simulation seems reasonable for the processing power of a modern computer -- maybe there are some inefficiencies in your R code that could result in speed-ups of 100-fold or more? Probably at a high level you can step through your code and see where the obvious bottlenecks are. If that fails, then ?Rprof might be helpful. Also for that kind of computing investment it's worth asking whether you are implementing your simulation with an appropriate algorithm; perhaps there are ways to re-formulate the problem in a more insightful way, leading to both greater throughput and perhaps analytic or conceptual insight. Martin > (I use Mac OS, Macbook Pro. Hardware specs are 2.9 GHz Dual-Core Intel Core i7 Processor and RAM: 8GB.) > > Then I tried to learn and use Cloud Computing, which was ended up with the Bioconductor Cloud AMI. > > I took the High-Memory Quadruple Extra Large Instance, API name: m2.4xlarge. > With the hope that it will make it faster in producing the results. > > What happened was the amazon cloud computing produced the result longer than my laptop. > It took about 8 hours for 1 result. > > I am really new to the cloud computing. > Could you help me to find out, what my mistake was in using/setting the cloud computing, therefore, the result was produce longer than my laptop? > What should I do to make it faster, please? > > For your attention and kindness, thank you very much in advance. > Happy New Year 2013. > > With kind regards, > R Fajriyah > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD COMMENT • link 13.1 years ago Martin Morgan 25k

0

Entering edit mode

Dear Sir, Thank you for your respond. Yes, I have identified the bottleneck part in my simulation, which was I thought can be solved by Cloud computing. Perhaps, my understanding about it was in the wrong way. Based on both responds from you and Dr. Carey, I'll try some possibilities. Thank you very much. It is really appreciated. With kind regards, R Fajriyah. ________________________________ From: Martin Morgan <mtmorgan@fhcrc.org> Cc: "bioconductor@r-project.org" <bioconductor@r-project.org> Sent: Tuesday, January 1, 2013 10:12 PM Subject: Re: [BioC] Bioconductor Cloud AMI On 01/01/2013 02:59 AM, Rohmatul Fajriyah wrote: > > > Dear All, > > I have a simulation program in R, which I run it in my laptop. > To produce 1 result, it took about 7 hours and I need about 50, 100, and 500 results. It's worth asking whether the amount of time to do a simulation seems reasonable for the processing power of a modern computer -- maybe there are some inefficiencies in your R code that could result in speed-ups of 100-fold or more? Probably at a high level you can step through your code and see where the obvious bottlenecks are. If that fails, then ?Rprof might be helpful. Also for that kind of computing investment it's worth asking whether you are implementing your simulation with an appropriate algorithm; perhaps there are ways to re-formulate the problem in a more insightful way, leading to both greater throughput and perhaps analytic or conceptual insight. Martin > (I use Mac OS, Macbook Pro. Hardware specs are 2.9 GHz Dual-Core Intel Core i7 Processor and RAM: 8GB.) > > Then I tried to learn and use Cloud Computing, which was ended up with the Bioconductor Cloud AMI. > > I took the High-Memory Quadruple Extra Large Instance, API name: m2.4xlarge. > With the hope that it will make it faster in producing the results. > > What happened was the amazon cloud computing produced the result longer than my laptop. > It took about 8 hours for 1 result. > > I am really new to the cloud computing. > Could you help me to find out, what my mistake was in using/setting the cloud computing, therefore, the result was produce longer than my laptop? > What should I do to make it faster, please? > > For your attention and kindness, thank you very much in advance. > Happy New Year 2013. > > With kind regards, > R Fajriyah > [[alternative HTML version deleted]] > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 [[alternative HTML version deleted]]

ADD REPLY • link 13.1 years ago Rohmatul Fajriyah ▴ 190

Login before adding your answer.