multicore Vignette or HowTo??

0

Entering edit mode

Edwin Groot ▴ 230

@edwin-groot-3606

Last seen 9.6 years ago

Hello all, I have difficulty getting the multicore package doing what it promises. Does anybody have a benchmark that demonstrates something intensive with and without multicore assistance? I have a dual dual-core Xeon, and $ top tells me all R can squeeze from my Linux system is 25% us. Here is my example: > library(Starr) #Read in a set of ChIP-chip arrays > read("array.rda") # $ top reports 25% us for the following: > array_norm <- normalize.Probes(array, method = "loess") #Try the same with multicore > library(multicore) > multicore:::detectCores() [1] 4 #No benefit from multicore. $ top reports 25% us for the following: > array_norm <- normalize.Probes(array, method = "loess") #lattice masks out parallel from multicore. Use mcparallel instead. > pnorm <- mcparallel(normalize.Probes(array, method = "loess")) > Normalizing probes with method: loess Done with 1 vs 2 in iteration 1 #Function continues for some time and displays more messages. No benefit from multicore. $ top reports 25% us during the run... > array_norm <- collect(pnorm) #Oh dear, where did my normalized data go? > array_norm $`4037` NULL > sessionInfo() R version 2.11.1 (2010-05-31) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] tools grid stats graphics grDevices utils datasets [8] methods base other attached packages: [1] geneplotter_1.26.0 annotate_1.26.1 AnnotationDbi_1.10.2 [4] Starr_1.4.4 affxparser_1.20.0 affy_1.26.1 [7] Ringo_1.12.0 Matrix_0.999375-39 lattice_0.18-8 [10] limma_3.4.4 RColorBrewer_1.0-2 Biobase_2.8.0 [13] multicore_0.1-3 loaded via a namespace (and not attached): [1] affyio_1.16.0 DBI_0.2-5 genefilter_1.30.0 [4] MASS_7.3-6 preprocessCore_1.10.0 pspline_1.0-14 [7] RSQLite_0.9-2 splines_2.11.1 survival_2.35-8 [10] tcltk_2.11.1 xtable_1.5-6 RTFMing only gives me the syntax of some functions in the multicore package. How do I apply successfully this thing to my code? Regards, Edwin -- Dr. Edwin Groot, postdoctoral associate AG Laux Institut fuer Biologie III Schaenzlestr. 1 79104 Freiburg, Deutschland +49 761-2032945

• 1.4k views

ADD COMMENT • link updated 13.5 years ago by Martin Morgan 25k • written 13.5 years ago by Edwin Groot ▴ 230

0

Entering edit mode

Tim Triche ★ 4.2k

@tim-triche-3561

Last seen 3.6 years ago

United States

This may or may not help, but for truly independent calculations (e.g. reading and normalizing a pile of arrays) I find that writing a function to do the task end-to-end and then handing it off to mclapply(list.of.keys, function) typically results in a near-linear speedup. However, multicore is really not the most elegant way to do that sort of thing. If you look at what Benilton Carvalho has done in the 'oligo' package, you will see a far more memory-efficient approach using the 'ff' and 'bit' packages to share a (supercached) flat-file image to successively stride through the chunks of data. Anyways, I'm dumb so I just use mclapply() and keep my memory image small, run gc() a lot, and mull over using 'oligo'. On Mon, Oct 18, 2010 at 9:05 AM, Edwin Groot < edwin.groot@biologie.uni-freiburg.de> wrote: > Hello all, > I have difficulty getting the multicore package doing what it promises. > Does anybody have a benchmark that demonstrates something intensive > with and without multicore assistance? > I have a dual dual-core Xeon, and $ top tells me all R can squeeze from > my Linux system is 25% us. Here is my example: > > > library(Starr) > #Read in a set of ChIP-chip arrays > > read("array.rda") > # $ top reports 25% us for the following: > > array_norm <- normalize.Probes(array, method = "loess") > #Try the same with multicore > > library(multicore) > > multicore:::detectCores() > [1] 4 > #No benefit from multicore. $ top reports 25% us for the following: > > array_norm <- normalize.Probes(array, method = "loess") > #lattice masks out parallel from multicore. Use mcparallel instead. > > pnorm <- mcparallel(normalize.Probes(array, method = "loess")) > > Normalizing probes with method: loess > Done with 1 vs 2 in iteration 1 > #Function continues for some time and displays more messages. No > benefit from multicore. $ top reports 25% us during the run... > > array_norm <- collect(pnorm) > #Oh dear, where did my normalized data go? > > array_norm > $`4037` > NULL > > sessionInfo() > R version 2.11.1 (2010-05-31) > x86_64-pc-linux-gnu > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] tools grid stats graphics grDevices utils > datasets > [8] methods base > > other attached packages: > [1] geneplotter_1.26.0 annotate_1.26.1 AnnotationDbi_1.10.2 > [4] Starr_1.4.4 affxparser_1.20.0 affy_1.26.1 > [7] Ringo_1.12.0 Matrix_0.999375-39 lattice_0.18-8 > [10] limma_3.4.4 RColorBrewer_1.0-2 Biobase_2.8.0 > [13] multicore_0.1-3 > > loaded via a namespace (and not attached): > [1] affyio_1.16.0 DBI_0.2-5 genefilter_1.30.0 > [4] MASS_7.3-6 preprocessCore_1.10.0 pspline_1.0-14 > [7] RSQLite_0.9-2 splines_2.11.1 survival_2.35-8 > [10] tcltk_2.11.1 xtable_1.5-6 > > RTFMing only gives me the syntax of some functions in the multicore > package. How do I apply successfully this thing to my code? > > Regards, > Edwin > -- > Dr. Edwin Groot, postdoctoral associate > AG Laux > Institut fuer Biologie III > Schaenzlestr. 1 > 79104 Freiburg, Deutschland > +49 761-2032945 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- With four parameters I can fit an elephant, and with five I can make him wiggle his trunk. John von Neumann [[alternative HTML version deleted]]

ADD COMMENT • link 13.5 years ago Tim Triche ★ 4.2k

0

Entering edit mode

I forgot to mention that the 'oligo' package uses 'snow' to co- ordinate the processing. It's more of a pain than just spawning a load of child processes with multicore and options("cores"=23) -- one less than is actually present usually works for me -- but at the end of the day it is also more scalable by far. 'multicore' is, for lack of a better phrase, an incredibly useful kludge. No disrespect to the authors of the package, Unix is a kludge in many respects too, and it still hasn't been superseded. On Mon, Oct 18, 2010 at 9:34 AM, Tim Triche <tim.triche@gmail.com> wrote: > This may or may not help, but for truly independent calculations (e.g. > reading and normalizing a pile of arrays) I find that writing a function to > do the task end-to-end and then handing it off to mclapply(list.of.keys, > function) typically results in a near-linear speedup. However, multicore is > really not the most elegant way to do that sort of thing. If you look at > what Benilton Carvalho has done in the 'oligo' package, you will see a far > more memory-efficient approach using the 'ff' and 'bit' packages to share a > (supercached) flat-file image to successively stride through the chunks of > data. > > Anyways, I'm dumb so I just use mclapply() and keep my memory image small, > run gc() a lot, and mull over using 'oligo'. > > > > On Mon, Oct 18, 2010 at 9:05 AM, Edwin Groot < > edwin.groot@biologie.uni-freiburg.de> wrote: > >> Hello all, >> I have difficulty getting the multicore package doing what it promises. >> Does anybody have a benchmark that demonstrates something intensive >> with and without multicore assistance? >> I have a dual dual-core Xeon, and $ top tells me all R can squeeze from >> my Linux system is 25% us. Here is my example: >> >> > library(Starr) >> #Read in a set of ChIP-chip arrays >> > read("array.rda") >> # $ top reports 25% us for the following: >> > array_norm <- normalize.Probes(array, method = "loess") >> #Try the same with multicore >> > library(multicore) >> > multicore:::detectCores() >> [1] 4 >> #No benefit from multicore. $ top reports 25% us for the following: >> > array_norm <- normalize.Probes(array, method = "loess") >> #lattice masks out parallel from multicore. Use mcparallel instead. >> > pnorm <- mcparallel(normalize.Probes(array, method = "loess")) >> > Normalizing probes with method: loess >> Done with 1 vs 2 in iteration 1 >> #Function continues for some time and displays more messages. No >> benefit from multicore. $ top reports 25% us during the run... >> > array_norm <- collect(pnorm) >> #Oh dear, where did my normalized data go? >> > array_norm >> $`4037` >> NULL >> > sessionInfo() >> R version 2.11.1 (2010-05-31) >> x86_64-pc-linux-gnu >> >> locale: >> [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 >> [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] tools grid stats graphics grDevices utils >> datasets >> [8] methods base >> >> other attached packages: >> [1] geneplotter_1.26.0 annotate_1.26.1 AnnotationDbi_1.10.2 >> [4] Starr_1.4.4 affxparser_1.20.0 affy_1.26.1 >> [7] Ringo_1.12.0 Matrix_0.999375-39 lattice_0.18-8 >> [10] limma_3.4.4 RColorBrewer_1.0-2 Biobase_2.8.0 >> [13] multicore_0.1-3 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.16.0 DBI_0.2-5 genefilter_1.30.0 >> [4] MASS_7.3-6 preprocessCore_1.10.0 pspline_1.0-14 >> [7] RSQLite_0.9-2 splines_2.11.1 survival_2.35-8 >> [10] tcltk_2.11.1 xtable_1.5-6 >> >> RTFMing only gives me the syntax of some functions in the multicore >> package. How do I apply successfully this thing to my code? >> >> Regards, >> Edwin >> -- >> Dr. Edwin Groot, postdoctoral associate >> AG Laux >> Institut fuer Biologie III >> Schaenzlestr. 1 >> 79104 Freiburg, Deutschland >> +49 761-2032945 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > > -- > With four parameters I can fit an elephant, and with five I can make him > wiggle his trunk. > > John von Neumann > > -- With four parameters I can fit an elephant, and with five I can make him wiggle his trunk. John von Neumann [[alternative HTML version deleted]]

ADD REPLY • link 13.5 years ago Tim Triche ★ 4.2k

0

Entering edit mode

On Mon, 18 Oct 2010 09:34:31 -0700 Tim Triche <tim.triche at="" gmail.com=""> wrote: > This may or may not help, but for truly independent calculations > (e.g. > reading and normalizing a pile of arrays) I find that writing a > function to > do the task end-to-end and then handing it off to > mclapply(list.of.keys, > function) typically results in a near-linear speedup. However, > multicore is > really not the most elegant way to do that sort of thing. If you > look at > what Benilton Carvalho has done in the 'oligo' package, you will see > a far > more memory-efficient approach using the 'ff' and 'bit' packages to > share a > (supercached) flat-file image to successively stride through the > chunks of > data. > > Anyways, I'm dumb so I just use mclapply() and keep my memory image > small, > run gc() a lot, and mull over using 'oligo'. > Hello Tim Well, I am dumber. How do I set up my data so that your suggestion of mclapply(list.of.keys, function) would work under the multicore package? My inkling is if I had 20 scanner files, and 4 CPU cores, it would have something to do with a list of 4 vectors of length 5 elements each. How does such a code look like? Thanks for the gc() tip. Edwin > > > On Mon, Oct 18, 2010 at 9:05 AM, Edwin Groot < > edwin.groot at biologie.uni-freiburg.de> wrote: > > > Hello all, > > I have difficulty getting the multicore package doing what it > promises. > > Does anybody have a benchmark that demonstrates something intensive > > with and without multicore assistance? > > I have a dual dual-core Xeon, and $ top tells me all R can squeeze > from > > my Linux system is 25% us. Here is my example: > > <snip> -- Dr. Edwin Groot, postdoctoral associate AG Laux Institut fuer Biologie III Schaenzlestr. 1 79104 Freiburg, Deutschland +49 761-2032945

ADD REPLY • link 13.5 years ago Edwin Groot ▴ 230

0

Entering edit mode

Hi Edwin, this has probably been answered already by now, but in case it hasn't: The solution is simple: library("multicore") options("cores"=4) result <- mclapply(<your_complete_list_of_data objects="">, <your_function_to_process_a-single_data_object>) No need to create extra vectors or something, mclapply will distribute the computations on each individual object on one processor each, so four objects will be treated in parallel. Regards, Joern On Wed, 20 Oct 2010 16:24:25 +0200, Edwin Groot wrote > > Hello Tim > > Well, I am dumber. How do I set up my data so that your suggestion of > mclapply(list.of.keys, function) would work under the multicore > package? > My inkling is if I had 20 scanner files, and 4 CPU cores, it would have > something to do with a list of 4 vectors of length 5 elements each. How > does such a code look like? > Thanks for the gc() tip. > > Edwin > --- Joern Toedling Institut Curie -- U900 26 rue d'Ulm, 75005 Paris, FRANCE Tel. +33 (0)156246927

ADD REPLY • link 13.5 years ago Joern Toedling ▴ 450

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 5 days ago

United States

On 10/18/2010 09:05 AM, Edwin Groot wrote: > Hello all, > I have difficulty getting the multicore package doing what it promises. > Does anybody have a benchmark that demonstrates something intensive > with and without multicore assistance? > I have a dual dual-core Xeon, and $ top tells me all R can squeeze from > my Linux system is 25% us. Here is my example: > >> library(Starr) > #Read in a set of ChIP-chip arrays >> read("array.rda") > # $ top reports 25% us for the following: >> array_norm <- normalize.Probes(array, method = "loess") > #Try the same with multicore >> library(multicore) >> multicore:::detectCores() > [1] 4 > #No benefit from multicore. $ top reports 25% us for the following: >> array_norm <- normalize.Probes(array, method = "loess") > #lattice masks out parallel from multicore. Use mcparallel instead. >> pnorm <- mcparallel(normalize.Probes(array, method = "loess")) Here's my favorite test of parallel functionality > library(multicore) > system.time(lapply(1:4, function(i) Sys.sleep(1))) user system elapsed 0.001 0.000 4.004 > system.time(mclapply(1:4, function(i) Sys.sleep(1))) user system elapsed 0.007 0.005 1.009 time goes 4x faster! Code has to be multicore-aware, and saying something like pnorm <- mcparallel(normalize.Probes(array, method = "loess")) array_norm <- collect(pnorm) just says to fork a process to do the task, not to do the task in parallel (multicore doesn't do anything clever, like identify parts of the code that could be parallelized). The Starr author would have to implement normalize.Probes to take advantage of multiple cores, or your own task would have to be parallelizable at the 'user' level, like an lapply. I'm really not sure why array_norm is NULL. after looking at the example on ?normalize.Probes I did dataPath <- system.file("extdata", package="Starr") bpmapChr1 <- readBpmap(file.path(dataPath, "Scerevisiae_tlg_chr1.bpmap")) cels <- c(file.path(dataPath,"Rpb3_IP_chr1.cel"), file.path(dataPath,"wt_IP_chr1.cel"), file.path(dataPath,"Rpb3_IP2_chr1.cel")) names <- c("rpb3_1", "wt_1","rpb3_2") type <- c("IP", "CONTROL", "IP") rpb3Chr1 <- readCelFile(bpmapChr1, cels, names, type, featureData=TRUE, log.it=TRUE) and then (not expecting to see any speed improvement, for the reason outlined above) > job <- mcparallel(normalize.Probes(rpb3Chr1,method="rankpercentile")) > job parallelJob: processID=12120 > collect(job) $`12120` ExpressionSet (storageMode: lockedEnvironment) assayData: 20000 features, 3 samples element names: exprs protocolData: none phenoData sampleNames: rpb3_1 wt_1 rpb3_2 varLabels: type CEL varMetadata: labelDescription featureData featureNames: 1 2 ... 20000 (20000 total) fvarLabels: chr seq pos fvarMetadata: labelDescription experimentData: use 'experimentData(object)' Annotation: Martin >> Normalizing probes with method: loess > Done with 1 vs 2 in iteration 1 > #Function continues for some time and displays more messages. No > benefit from multicore. $ top reports 25% us during the run... >> array_norm <- collect(pnorm) > #Oh dear, where did my normalized data go? >> array_norm > $`4037` > NULL >> sessionInfo() > R version 2.11.1 (2010-05-31) > x86_64-pc-linux-gnu > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] tools grid stats graphics grDevices utils > datasets > [8] methods base > > other attached packages: > [1] geneplotter_1.26.0 annotate_1.26.1 AnnotationDbi_1.10.2 > [4] Starr_1.4.4 affxparser_1.20.0 affy_1.26.1 > [7] Ringo_1.12.0 Matrix_0.999375-39 lattice_0.18-8 > [10] limma_3.4.4 RColorBrewer_1.0-2 Biobase_2.8.0 > [13] multicore_0.1-3 > > loaded via a namespace (and not attached): > [1] affyio_1.16.0 DBI_0.2-5 genefilter_1.30.0 > [4] MASS_7.3-6 preprocessCore_1.10.0 pspline_1.0-14 > [7] RSQLite_0.9-2 splines_2.11.1 survival_2.35-8 > [10] tcltk_2.11.1 xtable_1.5-6 > > RTFMing only gives me the syntax of some functions in the multicore > package. How do I apply successfully this thing to my code? > > Regards, > Edwin -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

ADD COMMENT • link 13.5 years ago Martin Morgan 25k

0

Entering edit mode

On Mon, 18 Oct 2010 09:39:22 -0700 Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > On 10/18/2010 09:05 AM, Edwin Groot wrote: > > Hello all, > > I have difficulty getting the multicore package doing what it > promises. > > Does anybody have a benchmark that demonstrates something intensive > > with and without multicore assistance? > > I have a dual dual-core Xeon, and $ top tells me all R can squeeze > from > > my Linux system is 25% us. Here is my example: > > <snip> > >> pnorm <- mcparallel(normalize.Probes(array, method = "loess")) > > Here's my favorite test of parallel functionality > > > library(multicore) > > system.time(lapply(1:4, function(i) Sys.sleep(1))) > user system elapsed > 0.001 0.000 4.004 > > system.time(mclapply(1:4, function(i) Sys.sleep(1))) > user system elapsed > 0.007 0.005 1.009 > > time goes 4x faster! Hmm, a great parlour trick! > > Code has to be multicore-aware, and saying something like > > pnorm <- mcparallel(normalize.Probes(array, method = "loess")) > array_norm <- collect(pnorm) > > just says to fork a process to do the task, not to do the task in > parallel (multicore doesn't do anything clever, like identify parts Ahah, I am ignorantly using this multicore package. It shows how little I know about what happens under-the-hood with the software. I asked this clueless question in the first place because I need some real data and code that demonstrated the principle of parallel computation. What I gave as an example was trivial, as it is a single process, right? If I get this right, I have to find a way to split my data into (up to 4 in my case) parts and have mcparallel() distribute their load? Hmm, but that would not work for normalization, because all the information from the data set is needed. Now what? > of > the code that could be parallelized). The Starr author would have to > implement normalize.Probes to take advantage of multiple cores, or > your > own task would have to be parallelizable at the 'user' level, like an > lapply. > > I'm really not sure why array_norm is NULL. after looking at the > example > on ?normalize.Probes I did > I think I entered array_norm <- collect(pnorm) twice, which probably throws out the contents from the first collect() call. <snip> > Martin > > >> Normalizing probes with method: loess > > Done with 1 vs 2 in iteration 1 > > #Function continues for some time and displays more messages. No > > benefit from multicore. $ top reports 25% us during the run... > >> array_norm <- collect(pnorm) > > #Oh dear, where did my normalized data go? > >> array_norm > > $`4037` > > NULL > >> sessionInfo() > > R version 2.11.1 (2010-05-31) > > x86_64-pc-linux-gnu > > > > locale: > > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > > [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 > > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] tools grid stats graphics grDevices utils > > datasets > > [8] methods base > > > > other attached packages: > > [1] geneplotter_1.26.0 annotate_1.26.1 AnnotationDbi_1.10.2 > > [4] Starr_1.4.4 affxparser_1.20.0 affy_1.26.1 > > [7] Ringo_1.12.0 Matrix_0.999375-39 lattice_0.18-8 > > [10] limma_3.4.4 RColorBrewer_1.0-2 Biobase_2.8.0 > > [13] multicore_0.1-3 > > > > loaded via a namespace (and not attached): > > [1] affyio_1.16.0 DBI_0.2-5 genefilter_1.30.0 > > > [4] MASS_7.3-6 preprocessCore_1.10.0 pspline_1.0-14 > > > [7] RSQLite_0.9-2 splines_2.11.1 survival_2.35-8 > > > [10] tcltk_2.11.1 xtable_1.5-6 > > > > RTFMing only gives me the syntax of some functions in the multicore > > package. How do I apply successfully this thing to my code? > > > > Regards, > > Edwin > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 Dr. Edwin Groot, postdoctoral associate AG Laux Institut fuer Biologie III Schaenzlestr. 1 79104 Freiburg, Deutschland +49 761-2032945

ADD REPLY • link 13.5 years ago Edwin Groot ▴ 230

Login before adding your answer.