Bioconductor AMI Amazon EC2 help
2
0
Entering edit mode
David Gibbs ▴ 20
@david-gibbs-4603
Last seen 9.6 years ago
Hi there, I'm a student working on a project with some Rmpi code that I would love to run on more nodes, but I'm having trouble with the Bioconductor image. I'm following the directions for spinning up a cluster found here: http://www.bioconductor.org/help/bioconductor-cloud-ami/ Once I have the cluster up with 3 nodes, I run the mpiTest.R script... it gets to the spawning function, then hangs. After about half an hour I killed it. Anyone getting this to work? Any hints? See output below. Thanks! ... ... Creating volume... I, [2011-04-20T03:31:10.592881 #763] INFO -- : New RightAws::Ec2 using shared connections mode I, [2011-04-20T03:31:10.681741 #763] INFO -- : Opening new HTTPS connection to ec2.amazonaws.com:443 warning: peer certificate won't be verified in this SSL session Waiting for volume to be available... . Volume is available. Created volume vol-d33fd5b8 in availability zone us-east-1d. ... ... # /usr/local/Rmpi/mpiutil -a xxx -s yyy -w 3 -n "my cluster" -t t1.micro -v vol-d33fd5b8 warning: peer certificate won't be verified in this SSL session using device /dev/sdg... waiting for volume to be attached.... .......Volume is attached. waiting for workers to start... .....................workers are up Cluster started. ... ... ... > library(Rmpi) > > mpi.spawn.Rslaves(nslaves = nsl) Warning: Permanently added 'worker002,10.215.117.28' (RSA) to the list of known hosts. Warning: Permanently added 'worker003,10.96.55.43' (RSA) to the list of known hosts. Warning: Permanently added 'worker001,10.206.198.16' (RSA) to the list of known hosts. ^Cmpirun: killing job... ---------------------------------------------------------------------- ---- mpirun was unable to cleanly terminate the daemons on the nodes shown below. Additional manual cleanup may be required - please refer to the "orte-clean" tool for assistance. ---------------------------------------------------------------------- ---- worker001 - daemon did not report back when launched worker002 - daemon did not report back when launched worker003 - daemon did not report back when launched Thanks very much! David Gibbs OHSU student [[alternative HTML version deleted]]
• 1.0k views
ADD COMMENT
0
Entering edit mode
Dan Tenenbaum ★ 8.2k
@dan-tenenbaum-4256
Last seen 3.2 years ago
United States
Hi David, On Tue, Apr 19, 2011 at 9:09 PM, David Gibbs <gibbsd at="" ohsu.edu=""> wrote: > Hi there, > > I'm a student working on a project with some Rmpi code that I would love to run on more nodes, > but I'm having trouble with the Bioconductor image. > > I'm following the directions for spinning up a cluster found here: http://www.bioconductor.org/help/bioconductor-cloud-ami/ > Once I have the cluster up with 3 nodes, I run the mpiTest.R script... it gets to the spawning function, then hangs. > After about half an hour I killed it. ?Anyone getting this to work? ?Any hints? ?See output below. ?Thanks! > > ... > ... > Creating volume... > I, [2011-04-20T03:31:10.592881 #763] ?INFO -- : New RightAws::Ec2 using shared connections mode > I, [2011-04-20T03:31:10.681741 #763] ?INFO -- : Opening new HTTPS connection to ec2.amazonaws.com:443 > warning: peer certificate won't be verified in this SSL session > Waiting for volume to be available... > . > Volume is available. > Created volume vol-d33fd5b8 in availability zone us-east-1d. > > ... > ... > # /usr/local/Rmpi/mpiutil -a xxx -s yyy -w 3 -n "my cluster" -t t1.micro -v vol-d33fd5b8 > warning: peer certificate won't be verified in this SSL session > using device /dev/sdg... > waiting for volume to be attached.... > .......Volume is attached. > waiting for workers to start... > .....................workers are up > Cluster started. > ... > ... > ... >> library(Rmpi) >> >> mpi.spawn.Rslaves(nslaves = nsl) > Warning: Permanently added 'worker002,10.215.117.28' (RSA) to the list of known hosts. > Warning: Permanently added 'worker003,10.96.55.43' (RSA) to the list of known hosts. > Warning: Permanently added 'worker001,10.206.198.16' (RSA) to the list of known hosts. > ^Cmpirun: killing job... > > -------------------------------------------------------------------- ------ > mpirun was unable to cleanly terminate the daemons on the nodes shown > below. Additional manual cleanup may be required - please refer to > the "orte-clean" tool for assistance. > -------------------------------------------------------------------- ------ > worker001 - daemon did not report back when launched > worker002 - daemon did not report back when launched > worker003 - daemon did not report back when launched > Sometimes it takes a few moments for the workers to be ready. Did you try the test script a few times? If it doesn't respond within a minute, try ^C and then try the test script again. I just tried it and it worked, but I had to try a couple of times before the workers were ready (didn't have to interrupt with ^C though). Once they were ready, I could run the test script multiple times. Thanks for your interest in the AMI. Let us know if you need further help. Dan > > Thanks very much! > David Gibbs > OHSU student > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
Dan Tenenbaum ★ 8.2k
@dan-tenenbaum-4256
Last seen 3.2 years ago
United States
On Wed, Apr 20, 2011 at 1:46 PM, David Gibbs <gibbsd at="" ohsu.edu=""> wrote: > Thanks Dan, > > I found my problem. ?:) > > I was running with a security group that had no open ports *within* the group. > I only had the SSH port open, so I could log in, and start up instances, but the instances couldn't > communicate with each other. ?A possible change to the bioconductor cloud AMI page is to > explicitly state what the security group should look like. > > Thanks for your help. ?I'm ready to do some computations! > Thanks Dave, I'm sharing with the list in case this is helpful for anyone else. I'll update the documentation as you suggest. Dan > -dave > > > ________________________________________ > From: Dan Tenenbaum [dtenenba at fhcrc.org] > Sent: Tuesday, April 19, 2011 10:00 PM > To: David Gibbs > Cc: bioconductor at r-project.org > Subject: Re: [BioC] Bioconductor AMI Amazon EC2 help > > Hi David, > > > On Tue, Apr 19, 2011 at 9:09 PM, David Gibbs <gibbsd at="" ohsu.edu=""> wrote: >> Hi there, >> >> I'm a student working on a project with some Rmpi code that I would love to run on more nodes, >> but I'm having trouble with the Bioconductor image. >> >> I'm following the directions for spinning up a cluster found here: http://www.bioconductor.org/help/bioconductor-cloud-ami/ >> Once I have the cluster up with 3 nodes, I run the mpiTest.R script... it gets to the spawning function, then hangs. >> After about half an hour I killed it. ?Anyone getting this to work? ?Any hints? ?See output below. ?Thanks! >> >> ... >> ... >> Creating volume... >> I, [2011-04-20T03:31:10.592881 #763] ?INFO -- : New RightAws::Ec2 using shared connections mode >> I, [2011-04-20T03:31:10.681741 #763] ?INFO -- : Opening new HTTPS connection to ec2.amazonaws.com:443 >> warning: peer certificate won't be verified in this SSL session >> Waiting for volume to be available... >> . >> Volume is available. >> Created volume vol-d33fd5b8 in availability zone us-east-1d. >> >> ... >> ... >> # /usr/local/Rmpi/mpiutil -a xxx -s yyy -w 3 -n "my cluster" -t t1.micro -v vol-d33fd5b8 >> warning: peer certificate won't be verified in this SSL session >> using device /dev/sdg... >> waiting for volume to be attached.... >> .......Volume is attached. >> waiting for workers to start... >> .....................workers are up >> Cluster started. >> ... >> ... >> ... >>> library(Rmpi) >>> >>> mpi.spawn.Rslaves(nslaves = nsl) >> Warning: Permanently added 'worker002,10.215.117.28' (RSA) to the list of known hosts. >> Warning: Permanently added 'worker003,10.96.55.43' (RSA) to the list of known hosts. >> Warning: Permanently added 'worker001,10.206.198.16' (RSA) to the list of known hosts. >> ^Cmpirun: killing job... >> >> ------------------------------------------------------------------- ------- >> mpirun was unable to cleanly terminate the daemons on the nodes shown >> below. Additional manual cleanup may be required - please refer to >> the "orte-clean" tool for assistance. >> ------------------------------------------------------------------- ------- >> worker001 - daemon did not report back when launched >> worker002 - daemon did not report back when launched >> worker003 - daemon did not report back when launched >> > > Sometimes it takes a few moments for the workers to be ready. Did you > try the test script a few times? > If it doesn't respond within a minute, try ^C and then try the test > script again. > > I just tried it and it worked, but I had to try a couple of times > before the workers were ready (didn't have to interrupt with ^C > though). Once they were ready, I could run the test script multiple > times. > > Thanks for your interest in the AMI. Let us know if you need further help. > Dan > >> >> Thanks very much! >> David Gibbs >> OHSU student >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >
ADD COMMENT

Login before adding your answer.

Traffic: 773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6