I am working with a slightly customized Bioconductor AMI (Version 3.1), where I installed my own packages on. I am trying to create a bigger cluster - 50 spot-instances with 32 CPUs (c3.x8large)- on Amazon AWS (region: EU Ireland) with help of the pre-installed Starcluster and the parallel backends described in BiocAMI . The problem is, that it is not working.
Three backend options are described on the help page of the Bioconductor AMI and I am having problems with all of them, most importantly the SGE backend as I intended to use it
All of the following problems can be obtained with trying to execute the minimal examples described on the help page (see hyperlink above), yet using instances that have more than one CPU.
- MPI: Described as not working "rstudio initialization error: unable to connect to service" after logging in to the Master node's Rstudio Server's login page
- SSH: Returning an "system2" error when using "makeSSHWorker(nodename="nameofnode"), which I traced back to the function "runOScommandlinux".
- SGE: It is working, yet apparently does not recongize the CPUs which I specify with
param <- BatchJobsParam(50, resources=list(ncpus=32))
The reason I believe this, is a) the missing performance increase of using 50*32=1600 parallelized nodes and b) observing instance performance workload in the AWS console, I can see that only a small part of the instances CPU capacity is used.
Especially regarding the SGE backend, I would appreciate information or help. Have I reached a limit with this many instances and nodes? Does anyone have experience with this?
Thank you very much for any help in advance.