snow library, question on clusterExport
2
0
Entering edit mode
@mattia-pelizzola-3304
Last seen 6 months ago
Italy
Hi, I have a simple function: > library(snow) > fun2=function() { + cl=makeCluster(3) + Mat=matrix(2:10,3,3) + fun3=function(startInd, endInd=3, data=Mat) {Mat[startInd:endInd,]} + print(clusterApplyLB(cl, 1:3, fun3)) + stopCluster(cl) + } that is working fine: > fun2() [[1]] [,1] [,2] [,3] [1,] 2 5 8 [2,] 3 6 9 [3,] 4 7 10 [[2]] [,1] [,2] [,3] [1,] 3 6 9 [2,] 4 7 10 [[3]] [1] 4 7 10 now, if I run the same commands outside the function: > cl=makeCluster(3) > Mat=matrix(2:10,3,3) > fun3=function(startInd, endInd=3, data=Mat) {Mat[startInd:endInd,]} > print(clusterApplyLB(cl, 1:3, fun3)) Error in checkForRemoteErrors(val) : 3 nodes produced errors; first error: object 'Mat' not found so I figured out I have to export 'Mat' on the cluster nodes: > clusterExport(cl, 'Mat') > print(clusterApplyLB(cl, 1:3, fun3)) [[1]] [,1] [,2] [,3] [1,] 2 5 8 [2,] 3 6 9 [3,] 4 7 10 [[2]] [,1] [,2] [,3] [1,] 3 6 9 [2,] 4 7 10 [[3]] [1] 4 7 10 I still do not understand why clusterExport is NOT necessary within the function 'fun2' and actually it would give an error: > rm(Mat) > fun2=function() { + cl=makeCluster(3) + Mat=matrix(2:10,3,3) + clusterExport(cl, 'Mat') + fun3=function(startInd, endInd=3, data=Mat) {Mat[startInd:endInd,]} + print(clusterApplyLB(cl, 1:3, fun3)) + stopCluster(cl) + } > fun2() Error in get(name, env = .GlobalEnv) : object 'Mat' not found I found clusterExport to be the solution for a more complex example, can I can't make it working within a function. What is it happening here with clusterExport? and how can I export an object that is not on my globalEnv but rather is created within a function? many thanks! mattia > sessionInfo() R version 2.10.1 (2009-12-14) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] snow_0.3-3
• 2.3k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 5 days ago
United States
Hi Mattia -- Probably the newsgroup https://stat.ethz.ch/mailman/listinfo/r-sig-hpc is appropriate, but... On 04/09/2010 03:47 PM, mattia pelizzola wrote: > Hi, > > I have a simple function: > >> library(snow) >> fun2=function() { > + cl=makeCluster(3) > + Mat=matrix(2:10,3,3) > + fun3=function(startInd, endInd=3, data=Mat) {Mat[startInd:endInd,]} > + print(clusterApplyLB(cl, 1:3, fun3)) > + stopCluster(cl) > + } A function includes, as part of its definition, the environment it is defined in. So f1 <- function() { f2 <- function() {} x <- 1 browser() } > f1() Called from: f1() Browse[1]> environment(f2) <environment: 0xb540b0=""> Browse[1]> ls(environment(f2)) [1] "f2" "x" Browse[1]> environment(f2)[["x"]] [1] 1 In something like clusterApplyLB, snow sends 'fun3' to the worker. This includes 'fun3's environment, and that in turn includes the variable 'Mat'. Note that this could be a big surprise, e.g., f1 = function() { f2 = function(i) i^2 m = matrix(numeric(1e7), 1e3) clusterApplyLB(cl, 1:10, f2) } sends the matrix 'm' to each node in the cluster (because it is defined in the evironment of f2), even though it is irrelevant to the calculation performed by f2. To illustrate f1 <- function(cl, x, do) { f2 <- function(i) ls(environment()) y <- x if (do) clusterApply(cl, 1:2, f2) } this sends a short vector > x <- integer(1); system.time(f1(cl, x, TRUE)) user system elapsed 0.000 0.000 0.001 and a long vector, so takes more time > x <- integer(1e6); system.time(f1(cl, x, TRUE)) user system elapsed 0.096 0.040 0.329 and here demonstrating that it's not the vector per se, but the transport > x <- integer(1e6); system.time(f1(cl, x, FALSE)) user system elapsed 0 0 0 > that is working fine: > >> fun2() > [[1]] > [,1] [,2] [,3] > [1,] 2 5 8 > [2,] 3 6 9 > [3,] 4 7 10 > > [[2]] > [,1] [,2] [,3] > [1,] 3 6 9 > [2,] 4 7 10 > > [[3]] > [1] 4 7 10 > > now, if I run the same commands outside the function: > >> cl=makeCluster(3) >> Mat=matrix(2:10,3,3) >> fun3=function(startInd, endInd=3, data=Mat) {Mat[startInd:endInd,]} >> print(clusterApplyLB(cl, 1:3, fun3)) > Error in checkForRemoteErrors(val) : > 3 nodes produced errors; first error: object 'Mat' not found Here snow has a special rule, which is 'do not export the global environment'. So environment(fun3) == .GlobalEnv, and 'Mat' is not exported, and not available to the worker. > > so I figured out I have to export 'Mat' on the cluster nodes: > >> clusterExport(cl, 'Mat') >> print(clusterApplyLB(cl, 1:3, fun3)) > [[1]] > [,1] [,2] [,3] > [1,] 2 5 8 > [2,] 3 6 9 > [3,] 4 7 10 > > [[2]] > [,1] [,2] [,3] > [1,] 3 6 9 > [2,] 4 7 10 > > [[3]] > [1] 4 7 10 > > I still do not understand why clusterExport is NOT necessary within > the function 'fun2' and actually it would give an error: > >> rm(Mat) >> fun2=function() { > + cl=makeCluster(3) > + Mat=matrix(2:10,3,3) > + clusterExport(cl, 'Mat') > + fun3=function(startInd, endInd=3, data=Mat) {Mat[startInd:endInd,]} > + print(clusterApplyLB(cl, 1:3, fun3)) > + stopCluster(cl) > + } >> fun2() > Error in get(name, env = .GlobalEnv) : object 'Mat' not found from ?clusterExport, ?clusterExport? assigns the global values on the master of the variables named in ?list? to variables of the same names in the global environments of each node. so snow is just doing what it is documented to do. > > I found clusterExport to be the solution for a more complex example, > can I can't make it working within a function. > What is it happening here with clusterExport? and how can I export an > object that is not on my globalEnv but rather is created within a > function? Hope that provides enough information to work through your problem. Martin > > many thanks! > > mattia > >> sessionInfo() > R version 2.10.1 (2009-12-14) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] snow_0.3-3 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
@pavelka-norman-4017
Last seen 9.6 years ago
Hi Mattia, Maybe I'm not getting what you're trying to do, but shouldn't your fun3 be using object 'data' rather than 'Mat' internally? HTH ;-) Norman # Hi, # # I have a simple function: # # > library(snow) # > fun2=function() { # + cl=makeCluster(3) # + Mat=matrix(2:10,3,3) # + fun3=function(startInd, endInd=3, data=Mat) {Mat[startInd:endInd,]} # + print(clusterApplyLB(cl, 1:3, fun3)) # + stopCluster(cl) # + } # # that is working fine: # # > fun2() # [[1]] # [,1] [,2] [,3] # [1,] 2 5 8 # [2,] 3 6 9 # [3,] 4 7 10 # # [[2]] # [,1] [,2] [,3] # [1,] 3 6 9 # [2,] 4 7 10 # # [[3]] # [1] 4 7 10 # # now, if I run the same commands outside the function: # # > cl=makeCluster(3) # > Mat=matrix(2:10,3,3) # > fun3=function(startInd, endInd=3, data=Mat) {Mat[startInd:endInd,]} # > print(clusterApplyLB(cl, 1:3, fun3)) # Error in checkForRemoteErrors(val) : # 3 nodes produced errors; first error: object 'Mat' not found # # so I figured out I have to export 'Mat' on the cluster nodes: # # > clusterExport(cl, 'Mat') # > print(clusterApplyLB(cl, 1:3, fun3)) # [[1]] # [,1] [,2] [,3] # [1,] 2 5 8 # [2,] 3 6 9 # [3,] 4 7 10 # # [[2]] # [,1] [,2] [,3] # [1,] 3 6 9 # [2,] 4 7 10 # # [[3]] # [1] 4 7 10 # # I still do not understand why clusterExport is NOT necessary within # the function 'fun2' and actually it would give an error: # # > rm(Mat) # > fun2=function() { # + cl=makeCluster(3) # + Mat=matrix(2:10,3,3) # + clusterExport(cl, 'Mat') # + fun3=function(startInd, endInd=3, data=Mat) {Mat[startInd:endInd,]} # + print(clusterApplyLB(cl, 1:3, fun3)) # + stopCluster(cl) # + } # > fun2() # Error in get(name, env = .GlobalEnv) : object 'Mat' not found # # # I found clusterExport to be the solution for a more complex example, # can I can't make it working within a function. # What is it happening here with clusterExport? and how can I export an # object that is not on my globalEnv but rather is created within a # function? # # many thanks! # # mattia # # > sessionInfo() # R version 2.10.1 (2009-12-14) # x86_64-unknown-linux-gnu # # locale: # [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C # [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 # [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 # [7] LC_PAPER=en_US.UTF-8 LC_NAME=C # [9] LC_ADDRESS=C LC_TELEPHONE=C # [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C # # attached base packages: # [1] stats graphics grDevices utils datasets methods base # # other attached packages: # [1] snow_0.3-3
ADD COMMENT
0
Entering edit mode
thanks Martin for the explanations and thanks Norman for pointing out that error in the example, unfortunately I am still stuck with the main problem: I have to use clusterExport to export an object to the cluster nodes. clusterExport only seems to export objects from the GlobalEnv, unfortunately. In my case this object is created within a function and clusterExport is called within the same function, so the object is not available in the GlobalEnv and I get error .. I'll try writing to the other mailing list, thanks mattia On Sun, Apr 11, 2010 at 8:49 AM, Pavelka, Norman <nxp at="" stowers.org=""> wrote: > Hi Mattia, > > Maybe I'm not getting what you're trying to do, but shouldn't your fun3 be using object 'data' rather than 'Mat' internally? > > HTH ;-) > Norman > > # ?Hi, > # > # ?I have a simple function: > # > # ?> library(snow) > # ?> fun2=function() { > # ?+ cl=makeCluster(3) > # ?+ Mat=matrix(2:10,3,3) > # ?+ fun3=function(startInd, endInd=3, data=Mat) {Mat[startInd:endInd,]} > # ?+ print(clusterApplyLB(cl, 1:3, fun3)) > # ?+ stopCluster(cl) > # ?+ } > # > # ?that is working fine: > # > # ?> fun2() > # ?[[1]] > # ? ? ? [,1] [,2] [,3] > # ?[1,] ? ?2 ? ?5 ? ?8 > # ?[2,] ? ?3 ? ?6 ? ?9 > # ?[3,] ? ?4 ? ?7 ? 10 > # > # ?[[2]] > # ? ? ? [,1] [,2] [,3] > # ?[1,] ? ?3 ? ?6 ? ?9 > # ?[2,] ? ?4 ? ?7 ? 10 > # > # ?[[3]] > # ?[1] ?4 ?7 10 > # > # ?now, if I run the same commands outside the function: > # > # ?> cl=makeCluster(3) > # ?> Mat=matrix(2:10,3,3) > # ?> fun3=function(startInd, endInd=3, data=Mat) {Mat[startInd:endInd,]} > # ?> print(clusterApplyLB(cl, 1:3, fun3)) > # ?Error in checkForRemoteErrors(val) : > # ? ?3 nodes produced errors; first error: object 'Mat' not found > # > # ?so I figured out I have to export 'Mat' on the cluster nodes: > # > # ?> clusterExport(cl, 'Mat') > # ?> print(clusterApplyLB(cl, 1:3, fun3)) > # ?[[1]] > # ? ? ? [,1] [,2] [,3] > # ?[1,] ? ?2 ? ?5 ? ?8 > # ?[2,] ? ?3 ? ?6 ? ?9 > # ?[3,] ? ?4 ? ?7 ? 10 > # > # ?[[2]] > # ? ? ? [,1] [,2] [,3] > # ?[1,] ? ?3 ? ?6 ? ?9 > # ?[2,] ? ?4 ? ?7 ? 10 > # > # ?[[3]] > # ?[1] ?4 ?7 10 > # > # ?I still do not understand why clusterExport is NOT necessary within > # ?the function 'fun2' and actually it would give an error: > # > # ?> rm(Mat) > # ?> fun2=function() { > # ?+ cl=makeCluster(3) > # ?+ Mat=matrix(2:10,3,3) > # ?+ clusterExport(cl, 'Mat') > # ?+ fun3=function(startInd, endInd=3, data=Mat) {Mat[startInd:endInd,]} > # ?+ print(clusterApplyLB(cl, 1:3, fun3)) > # ?+ stopCluster(cl) > # ?+ } > # ?> fun2() > # ?Error in get(name, env = .GlobalEnv) : object 'Mat' not found > # > # > # ?I found clusterExport to be the solution for a more complex example, > # ?can I can't make it working within a function. > # ?What is it happening here with clusterExport? and how can I export an > # ?object that is not on my globalEnv but rather is created within a > # ?function? > # > # ?many thanks! > # > # ?mattia > # > # ?> sessionInfo() > # ?R version 2.10.1 (2009-12-14) > # ?x86_64-unknown-linux-gnu > # > # ?locale: > # ? [1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C > # ? [3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8 > # ? [5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8 > # ? [7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C > # ? [9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C > # ?[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > # > # ?attached base packages: > # ?[1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > # > # ?other attached packages: > # ?[1] snow_0.3-3
ADD REPLY
0
Entering edit mode
On 04/12/2010 09:25 AM, mattia pelizzola wrote: > thanks Martin for the explanations and thanks Norman for pointing out > that error in the example, > > unfortunately I am still stuck with the main problem: > I have to use clusterExport to export an object to the cluster nodes. It's hard to know without a (simple) example why you have to use clusterExport. If the object is not defined in the environment of the function, then a sure-fire way of getting it to your cluster nodes is to explicitly include it in the clusterApplyLB call fun4=function(startInd, endInd=3, data) data[startInd:endInd,] clusterApplyLB(cl, 1:3, fun4, data=Mat) Martin > clusterExport only seems to export objects from the GlobalEnv, > unfortunately. In my case this object is created within a function and > clusterExport is called within the same function, so the object is not > available in the GlobalEnv and I get error .. > > I'll try writing to the other mailing list, > thanks > > mattia > > On Sun, Apr 11, 2010 at 8:49 AM, Pavelka, Norman <nxp at="" stowers.org=""> wrote: >> Hi Mattia, >> >> Maybe I'm not getting what you're trying to do, but shouldn't your fun3 be using object 'data' rather than 'Mat' internally? >> >> HTH ;-) >> Norman >> >> # Hi, >> # >> # I have a simple function: >> # >> # > library(snow) >> # > fun2=function() { >> # + cl=makeCluster(3) >> # + Mat=matrix(2:10,3,3) >> # + fun3=function(startInd, endInd=3, data=Mat) {Mat[startInd:endInd,]} >> # + print(clusterApplyLB(cl, 1:3, fun3)) >> # + stopCluster(cl) >> # + } >> # >> # that is working fine: >> # >> # > fun2() >> # [[1]] >> # [,1] [,2] [,3] >> # [1,] 2 5 8 >> # [2,] 3 6 9 >> # [3,] 4 7 10 >> # >> # [[2]] >> # [,1] [,2] [,3] >> # [1,] 3 6 9 >> # [2,] 4 7 10 >> # >> # [[3]] >> # [1] 4 7 10 >> # >> # now, if I run the same commands outside the function: >> # >> # > cl=makeCluster(3) >> # > Mat=matrix(2:10,3,3) >> # > fun3=function(startInd, endInd=3, data=Mat) {Mat[startInd:endInd,]} >> # > print(clusterApplyLB(cl, 1:3, fun3)) >> # Error in checkForRemoteErrors(val) : >> # 3 nodes produced errors; first error: object 'Mat' not found >> # >> # so I figured out I have to export 'Mat' on the cluster nodes: >> # >> # > clusterExport(cl, 'Mat') >> # > print(clusterApplyLB(cl, 1:3, fun3)) >> # [[1]] >> # [,1] [,2] [,3] >> # [1,] 2 5 8 >> # [2,] 3 6 9 >> # [3,] 4 7 10 >> # >> # [[2]] >> # [,1] [,2] [,3] >> # [1,] 3 6 9 >> # [2,] 4 7 10 >> # >> # [[3]] >> # [1] 4 7 10 >> # >> # I still do not understand why clusterExport is NOT necessary within >> # the function 'fun2' and actually it would give an error: >> # >> # > rm(Mat) >> # > fun2=function() { >> # + cl=makeCluster(3) >> # + Mat=matrix(2:10,3,3) >> # + clusterExport(cl, 'Mat') >> # + fun3=function(startInd, endInd=3, data=Mat) {Mat[startInd:endInd,]} >> # + print(clusterApplyLB(cl, 1:3, fun3)) >> # + stopCluster(cl) >> # + } >> # > fun2() >> # Error in get(name, env = .GlobalEnv) : object 'Mat' not found >> # >> # >> # I found clusterExport to be the solution for a more complex example, >> # can I can't make it working within a function. >> # What is it happening here with clusterExport? and how can I export an >> # object that is not on my globalEnv but rather is created within a >> # function? >> # >> # many thanks! >> # >> # mattia >> # >> # > sessionInfo() >> # R version 2.10.1 (2009-12-14) >> # x86_64-unknown-linux-gnu >> # >> # locale: >> # [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> # [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> # [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >> # [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> # [9] LC_ADDRESS=C LC_TELEPHONE=C >> # [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> # >> # attached base packages: >> # [1] stats graphics grDevices utils datasets methods base >> # >> # other attached packages: >> # [1] snow_0.3-3 -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 653 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6