Installation on a cluster

0

Entering edit mode

Daniel Davidson ▴ 20

@daniel-davidson-1551

Last seen 11.4 years ago

Hello, I have been tasked with getting Bioconductor installed on our cluster. Because the slave nodes cannot access the Internet, the normal method of install using: source("http://bioconductor.org/biocLite.R") biocinstallPkgGroups("lite") will not work. Does anyone have a good method of doing this on a cluster? We have a local Bioconductor mirror on the cluster that is shared of NFS. thanks, Dan

• 2.2k views

ADD COMMENT • link updated 17.8 years ago by Claudio Lottaz ▴ 40 • written 17.8 years ago by Daniel Davidson ▴ 20

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 16 days ago

United States

On Thu, Apr 17, 2008 at 9:16 AM, Daniel Davidson <danield at="" igb.uiuc.edu=""> wrote: > Hello, > > I have been tasked with getting Bioconductor installed on our cluster. > Because the slave nodes cannot access the Internet, the normal method of > install using: > > source("http://bioconductor.org/biocLite.R") > biocinstallPkgGroups("lite") > > will not work. Does anyone have a good method of doing this on a cluster? We have a local Bioconductor mirror on the cluster that is shared of NFS. > Hi, Dan. The way we do this is to make an nfs-shared /usr/local and install R there. Then, use biocLite to install packages to the shared directory. The benefit of this setup is that you update in only one place and only once either packages or R itself and it is automatically seen on all machines. An added benefit is that additional packages (graphviz, netcdf, etc.) need only be installed into the shared /usr/local tree and all nodes will see them. Of course, this assumes that your nodes are all one architecture, but since you said "cluster", I assume that is the case Sean

ADD COMMENT • link 17.8 years ago Sean Davis 21k

0

Entering edit mode

Claudio Lottaz ▴ 40

@claudio-lottaz-2756

Last seen 9.1 years ago

Hi folks, Sean's suggestion to install in a cluster is indeed easy to maintain. We did it similarly but encountered network traffic issues. If you start 50 R-processes at the same time, opening plenty of shared libraries and loading data seemed to bring the network down. Did anybody observe this kind of problems as well? wouldn't it be advisable to distribute the distribution locally on all nodes after installing it in the common NFS-place? Cheers, Claudio -----Original Message----- From: "Sean Davis" [mailto:sdavis2@mail.nih.gov] Sent: Thursday, April 17, 2008 3:40 PM To: "Daniel Davidson" <danield at="" igb.uiuc.edu=""> Cc: <bioconductor at="" stat.math.ethz.ch=""> Subject: Re: [BioC] Installation on a cluster On Thu, Apr 17, 2008 at 9:16 AM, Daniel Davidson <danield at="" igb.uiuc.edu=""> wrote: > Hello, > > I have been tasked with getting Bioconductor installed on our cluster. > Because the slave nodes cannot access the Internet, the normal method > of install using: > > source("http://bioconductor.org/biocLite.R") > biocinstallPkgGroups("lite") > > will not work. Does anyone have a good method of doing this on a cluster? We have a local Bioconductor mirror on the cluster that is shared of NFS. > Hi, Dan. The way we do this is to make an nfs-shared /usr/local and install R there. Then, use biocLite to install packages to the shared directory. The benefit of this setup is that you update in only one place and only once either packages or R itself and it is automatically seen on all machines. An added benefit is that additional packages (graphviz, netcdf, etc.) need only be installed into the shared /usr/local tree and all nodes will see them. Of course, this assumes that your nodes are all one architecture, but since you said "cluster", I assume that is the case Sean _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 17.8 years ago Claudio Lottaz ▴ 40

0

Entering edit mode

On Thu, Apr 17, 2008 at 12:56 PM, Claudio Lottaz <claudio.lottaz at="" klinik.uni-regensburg.de=""> wrote: > Hi folks, > > Sean's suggestion to install in a cluster is indeed easy to maintain. We did it similarly but encountered network traffic issues. If you start 50 R-processes at the same time, opening plenty of shared libraries and loading data seemed to bring the network down. Did anybody observe this kind of problems as well? wouldn't it be advisable to distribute the distribution locally on all nodes after installing it in the common NFS-place? > I agree that I/O can be an issue. There are file systems that are specifically designed with some of these issues in mind (see AFS, as an example). If your are using a lot of small R processes that run for a second or less on a large cluster, reading shared libraries and things might be an issue. However, remember that linux caches files, so these may not be loaded from disk more than once if they can remain in cache on the nodes. This, again, will depend on the use cases. Also, if you start 50 processes and they then run for 24 hours each, it is not an issue. The other issue is a larger one. If your R processes are all accessing large quantities of data from a shared disk, then there very well may be issues. However, this is a harder one to solve on the cluster and may require some work on the file server. If R is doing writing of temporary files, etc., that should be done on local nodes as best as possible. In short, I think Claudio brings up a good point that a one-size-fits-all approach to the problem is naive. It is worthwhile learning what bottlenecks your installation and institution might face and go from there. Although I do not use them, I think there are cluster solutions that will allow you to "push" an image of the OS to the nodes in an automated fashion, but I can't imagine those can be used in a "live" cluster without some care (bringing down a few nodes at a time, for example). Someone else with more experience and knowledge will need to comment on the more complex solutions. Sean > -----Original Message----- > From: "Sean Davis" [mailto:sdavis2 at mail.nih.gov] > Sent: Thursday, April 17, 2008 3:40 PM > To: "Daniel Davidson" <danield at="" igb.uiuc.edu=""> > Cc: <bioconductor at="" stat.math.ethz.ch=""> > Subject: Re: [BioC] Installation on a cluster > > On Thu, Apr 17, 2008 at 9:16 AM, Daniel Davidson <danield at="" igb.uiuc.edu=""> wrote: > > Hello, > > > > I have been tasked with getting Bioconductor installed on our cluster. > > Because the slave nodes cannot access the Internet, the normal method > > of install using: > > > > source("http://bioconductor.org/biocLite.R") > > biocinstallPkgGroups("lite") > > > > will not work. Does anyone have a good method of doing this on a cluster? We have a local Bioconductor mirror on the cluster that is shared of NFS. > > > > Hi, Dan. > > The way we do this is to make an nfs-shared /usr/local and install R there. Then, use biocLite to install packages to the shared > directory. The benefit of this setup is that you update in only one > place and only once either packages or R itself and it is automatically seen on all machines. An added benefit is that additional packages (graphviz, netcdf, etc.) need only be installed into the shared /usr/local tree and all nodes will see them. Of course, this assumes that your nodes are all one architecture, but since you said "cluster", I assume that is the case > > Sean > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 17.8 years ago Sean Davis 21k

0

Entering edit mode

Sean Davis wrote: > I agree that I/O can be an issue. There are file systems that are > specifically designed with some of these issues in mind (see AFS, as > an example). In most use cases of AFS, the cluster nodes would not see AFS any more than any other part of the public Internet. Even if they did, the initial load of something onto a particular client machine is not going to be affected by AFS caching facilities anyway. As an AFS admin, I don't think AFS has anything to offer to the solution of this particular problem. -- Atro Tossavainen (Mr.) / The Institute of Biotechnology at Systems Analyst, Techno-Amish & / the University of Helsinki, Finland, +358-9-19158939 UNIX Dinosaur / employs me, but my opinions are my own. < URL : http : / / www . helsinki . fi / %7E atossava / > NO FILE ATTACHMENTS

ADD REPLY • link 17.8 years ago Atro Tossavainen ▴ 160

Login before adding your answer.