Entering edit mode
David Gibbs
▴
20
@david-gibbs-4603
Last seen 9.6 years ago
Hi there,
I'm a student working on a project with some Rmpi code that I would
love to run on more nodes,
but I'm having trouble with the Bioconductor image.
I'm following the directions for spinning up a cluster found here:
http://www.bioconductor.org/help/bioconductor-cloud-ami/
Once I have the cluster up with 3 nodes, I run the mpiTest.R script...
it gets to the spawning function, then hangs.
After about half an hour I killed it. Anyone getting this to work?
Any hints? See output below. Thanks!
...
...
Creating volume...
I, [2011-04-20T03:31:10.592881 #763] INFO -- : New RightAws::Ec2
using shared connections mode
I, [2011-04-20T03:31:10.681741 #763] INFO -- : Opening new HTTPS
connection to ec2.amazonaws.com:443
warning: peer certificate won't be verified in this SSL session
Waiting for volume to be available...
.
Volume is available.
Created volume vol-d33fd5b8 in availability zone us-east-1d.
...
...
# /usr/local/Rmpi/mpiutil -a xxx -s yyy -w 3 -n "my cluster" -t
t1.micro -v vol-d33fd5b8
warning: peer certificate won't be verified in this SSL session
using device /dev/sdg...
waiting for volume to be attached....
.......Volume is attached.
waiting for workers to start...
.....................workers are up
Cluster started.
...
...
...
> library(Rmpi)
>
> mpi.spawn.Rslaves(nslaves = nsl)
Warning: Permanently added 'worker002,10.215.117.28' (RSA) to the list
of known hosts.
Warning: Permanently added 'worker003,10.96.55.43' (RSA) to the list
of known hosts.
Warning: Permanently added 'worker001,10.206.198.16' (RSA) to the list
of known hosts.
^Cmpirun: killing job...
----------------------------------------------------------------------
----
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
----------------------------------------------------------------------
----
worker001 - daemon did not report back when launched
worker002 - daemon did not report back when launched
worker003 - daemon did not report back when launched
Thanks very much!
David Gibbs
OHSU student
[[alternative HTML version deleted]]