Search
Question: multicore and GRangesList
0
gravatar for arne.mueller@novartis.com
6.9 years ago by
Switzerland
Dear All, Has anybody experience using the multicore package with GRangesLists from the GenomicRanges package? I can't get it working ..., here's an example: > a = GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) > b = GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) > grl = GRangesList(a, b) > sapply(grl, length) [1] 3 3 > mclapply(grl, length, mc.cores=2) [[1]] [1] "Error in as.list.default(X) : \n no method for coercing this S4 class to a vector\n" [[2]] [1] "Error in as.list.default(X) : \n no method for coercing this S4 class to a vector\n" > sessionInfo() R version 2.13.0 Under development (unstable) (2010-12-20 r53870) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] multicore_0.1-3 GenomicRanges_1.3.7 IRanges_1.9.17 loaded via a namespace (and not attached): [1] tools_2.13.0 thanks a lot for hints .. arne [[alternative HTML version deleted]]
ADD COMMENTlink modified 6.9 years ago by Martin Morgan ♦♦ 20k • written 6.9 years ago by arne.mueller@novartis.com200
0
gravatar for Steve Lianoglou
6.9 years ago by
Genentech
Steve Lianoglou12k wrote:
Hi, On Tue, Jan 11, 2011 at 9:34 AM, <arne.mueller at="" novartis.com=""> wrote: > Dear All, > > Has anybody experience using the multicore package with GRangesLists from > the GenomicRanges package? I can't get it working ..., here's an example: > >> a = GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) >> b = GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) >> grl = GRangesList(a, b) >> sapply(grl, length) > [1] 3 3 >> mclapply(grl, length, mc.cores=2) > [[1]] > [1] "Error in as.list.default(X) : \n ?no method for coercing this S4 > class to a vector\n" This works with the foreach and doMC (which uses the multicore package) combo, if you're interested: R> library(GenomicRanges) R> library(doMC) R> a <- GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) R> b <- GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) R> grl = GRangesList(a, b) R> registerDoMC(2) R> foreach(g=grl, .packages='GenomicRanges') %dopar% length(g) [[1]] [1] 3 [[2]] [1] 3 R> sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] doMC_1.2.1 multicore_0.1-3 foreach_1.3.0 [4] codetools_0.2-6 iterators_1.0.3 GenomicRanges_1.2.2 [7] IRanges_1.8.8 -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD COMMENTlink written 6.9 years ago by Steve Lianoglou12k
Hello, thanks, all for your reply. The doMC works fine for me. Nevertheless, there seems to be some overhead since at the beginning and the end of the loop. The initial R-process doesn't fork immediately and when all child processes are finished the parent still processes for quite some time before it returns the result. regards, arne Steve Lianoglou <mailinglist.honeypot@gmail.com> 01/11/2011 04:49 PM To arne.mueller@novartis.com cc bioconductor@stat.math.ethz.ch Subject Re: [BioC] multicore and GRangesList Hi, On Tue, Jan 11, 2011 at 9:34 AM, <arne.mueller@novartis.com> wrote: > Dear All, > > Has anybody experience using the multicore package with GRangesLists from > the GenomicRanges package? I can't get it working ..., here's an example: > >> a = GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) >> b = GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) >> grl = GRangesList(a, b) >> sapply(grl, length) > [1] 3 3 >> mclapply(grl, length, mc.cores=2) > [[1]] > [1] "Error in as.list.default(X) : \n no method for coercing this S4 > class to a vector\n" This works with the foreach and doMC (which uses the multicore package) combo, if you're interested: R> library(GenomicRanges) R> library(doMC) R> a <- GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) R> b <- GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) R> grl = GRangesList(a, b) R> registerDoMC(2) R> foreach(g=grl, .packages='GenomicRanges') %dopar% length(g) [[1]] [1] 3 [[2]] [1] 3 R> sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] doMC_1.2.1 multicore_0.1-3 foreach_1.3.0 [4] codetools_0.2-6 iterators_1.0.3 GenomicRanges_1.2.2 [7] IRanges_1.8.8 -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]
ADD REPLYlink written 6.9 years ago by arne.mueller@novartis.com200
On Tue, Jan 11, 2011 at 10:19 AM, <arne.mueller@novartis.com> wrote: > Hello, > > thanks, all for your reply. The doMC works fine for me. Nevertheless, > there seems to be some overhead since at the beginning and the end of the > loop. The initial R-process doesn't fork immediately and when all child > processes are finished the parent still processes for quite some time > before it returns the result. > > Right, there's going to be a lot of overhead in the subsetting etc as Cory mentioned. The operation needs to be fairly long running for it to be worth the overhead of splitting things up. regards, > > arne > > > > > Steve Lianoglou <mailinglist.honeypot@gmail.com> > 01/11/2011 04:49 PM > > To > arne.mueller@novartis.com > cc > bioconductor@stat.math.ethz.ch > Subject > Re: [BioC] multicore and GRangesList > > > > > > > Hi, > > On Tue, Jan 11, 2011 at 9:34 AM, <arne.mueller@novartis.com> wrote: > > Dear All, > > > > Has anybody experience using the multicore package with GRangesLists > from > > the GenomicRanges package? I can't get it working ..., here's an > example: > > > >> a = GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) > >> b = GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) > >> grl = GRangesList(a, b) > >> sapply(grl, length) > > [1] 3 3 > >> mclapply(grl, length, mc.cores=2) > > [[1]] > > [1] "Error in as.list.default(X) : \n no method for coercing this S4 > > class to a vector\n" > > This works with the foreach and doMC (which uses the multicore > package) combo, if you're interested: > > R> library(GenomicRanges) > R> library(doMC) > R> a <- GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) > R> b <- GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) > R> grl = GRangesList(a, b) > R> registerDoMC(2) > R> foreach(g=grl, .packages='GenomicRanges') %dopar% length(g) > [[1]] > [1] 3 > > [[2]] > [1] 3 > > R> sessionInfo() > R version 2.12.1 (2010-12-16) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] doMC_1.2.1 multicore_0.1-3 foreach_1.3.0 > [4] codetools_0.2-6 iterators_1.0.3 GenomicRanges_1.2.2 > [7] IRanges_1.8.8 > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio.mskc="" c.org="" %7elianos="" contact=""> > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 6.9 years ago by Michael Lawrence9.8k
Hi, On Tue, Jan 11, 2011 at 1:19 PM, <arne.mueller@novartis.com> wrote: > Hello, > > thanks, all for your reply. The doMC works fine for me. Nevertheless, > there seems to be some overhead since at the beginning and the end of the > loop. The initial R-process doesn't fork immediately and when all child > processes are finished the parent still processes for quite some time before > it returns the result. > Such is the price you pay for "easy" parallelization. If it takes you longer to split your jobs + reduce/post process the result than it does to just run the job linearly, then you might as well use good ol' lapply. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]
ADD REPLYlink written 6.9 years ago by Steve Lianoglou12k
0
gravatar for Martin Morgan
6.9 years ago by
Martin Morgan ♦♦ 20k
United States
Martin Morgan ♦♦ 20k wrote:
On 01/11/2011 06:34 AM, arne.mueller at novartis.com wrote: > Dear All, > > Has anybody experience using the multicore package with GRangesLists from > the GenomicRanges package? I can't get it working ..., here's an example: > >> a = GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) >> b = GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) >> grl = GRangesList(a, b) >> sapply(grl, length) > [1] 3 3 >> mclapply(grl, length, mc.cores=2) > [[1]] > [1] "Error in as.list.default(X) : \n no method for coercing this S4 > class to a vector\n" A hack is assignInNamespace("lapply", lapply, "base") and then idx <- seq_len(1000) res3 <- mclapply(tx[idx], length) this is about 6x faster than Cory's mclapply(idx, function(i, grl) length(grl[[i]]), tx[idx]) because lapply,GRangesList is being more efficient at extracting ranges than [[ (maybe less validity checking?). I think this should scale with the number of cores, but for whatever reason all my processes stay on the same cpu. Martin > > [[2]] > [1] "Error in as.list.default(X) : \n no method for coercing this S4 > class to a vector\n" > > >> sessionInfo() > R version 2.13.0 Under development (unstable) (2010-12-20 r53870) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] multicore_0.1-3 GenomicRanges_1.3.7 IRanges_1.9.17 > > loaded via a namespace (and not attached): > [1] tools_2.13.0 > > thanks a lot for hints .. > > arne > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENTlink written 6.9 years ago by Martin Morgan ♦♦ 20k
0
gravatar for Stefano Calza
6.9 years ago by
Stefano Calza10 wrote:
Hi, you cannot do that. lapply, sapply, ecc. methods for GRangesLists are specific, not standard ones. I've been using it but afaik requires some hacking (unless you coerce the object to a list, which took long time for me...and didn't like). maybe someone has better suggestion Stefano On Tue, Jan 11, 2011 at 03:34:46PM +0100, arne.mueller at novartis.com wrote: <arne.mueller>Dear All, <arne.mueller> <arne.mueller>Has anybody experience using the multicore package with GRangesLists from <arne.mueller>the GenomicRanges package? I can't get it working ..., here's an example: <arne.mueller> <arne.mueller>> a = GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) <arne.mueller>> b = GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) <arne.mueller>> grl = GRangesList(a, b) <arne.mueller>> sapply(grl, length) <arne.mueller>[1] 3 3 <arne.mueller>> mclapply(grl, length, mc.cores=2) <arne.mueller>[[1]] <arne.mueller>[1] "Error in as.list.default(X) : \n no method for coercing this S4 <arne.mueller>class to a vector\n" <arne.mueller> <arne.mueller>[[2]] <arne.mueller>[1] "Error in as.list.default(X) : \n no method for coercing this S4 <arne.mueller>class to a vector\n" <arne.mueller> <arne.mueller> <arne.mueller>> sessionInfo() <arne.mueller>R version 2.13.0 Under development (unstable) (2010-12-20 r53870) <arne.mueller>Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) <arne.mueller> <arne.mueller>locale: <arne.mueller>[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 <arne.mueller> <arne.mueller>attached base packages: <arne.mueller>[1] stats graphics grDevices utils datasets methods base <arne.mueller> <arne.mueller>other attached packages: <arne.mueller>[1] multicore_0.1-3 GenomicRanges_1.3.7 IRanges_1.9.17 <arne.mueller> <arne.mueller>loaded via a namespace (and not attached): <arne.mueller>[1] tools_2.13.0 <arne.mueller> <arne.mueller> thanks a lot for hints .. <arne.mueller> <arne.mueller> arne <arne.mueller> [[alternative HTML version deleted]] <arne.mueller> <arne.mueller>_______________________________________________ <arne.mueller>Bioconductor mailing list <arne.mueller>Bioconductor at r-project.org <arne.mueller>https://stat.ethz.ch/mailman/listinfo/bioconductor <arne.mueller>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Stefano Calza, PhD Researcher/Assistent Professor - Biostatistician *Sezione di Statistica Medica e Biometria Dipartimento di Scienze Biomediche e Biotecnologie Universit? degli Studi di Brescia - Italy Viale Europa, 11 25123 Brescia email: stefano.calza at med.unibs.it stefano.calza at biostatistics.it pec: stefano.calza at pec.biostatistics.it Phone: +390303717653 Fax: +390303717488
ADD COMMENTlink written 6.9 years ago by Stefano Calza10
I think largely to get around this issue, some methods were written to handle common cases where lapplying on a GRangesList might be most intuitive. For example, this should work quickly on your provided example: elementLengths(grl) In other situations, you can lapply over the indices or names of the GRangesList. However, the subsetting can lead to overhead larger than parallelization benefits. -Cory On Tue, Jan 11, 2011 at 6:52 AM, Stefano Calza <stefano.calza@med.unibs.it>wrote: > Hi, > > you cannot do that. lapply, sapply, ecc. methods for GRangesLists are > specific, not standard ones. > > I've been using it but afaik requires some hacking (unless you coerce the > object to a list, which took long time for me...and didn't like). > > maybe someone has better suggestion > > Stefano > > On Tue, Jan 11, 2011 at 03:34:46PM +0100, arne.mueller@novartis.com wrote: > <arne.mueller>Dear All, > <arne.mueller> > <arne.mueller>Has anybody experience using the multicore package with > GRangesLists from > <arne.mueller>the GenomicRanges package? I can't get it working ..., here's > an example: > <arne.mueller> > <arne.mueller>> a = GRanges(seqnames="A", ranges=IRanges(start=1:3, > width=5)) > <arne.mueller>> b = GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), > width=5)) > <arne.mueller>> grl = GRangesList(a, b) > <arne.mueller>> sapply(grl, length) > <arne.mueller>[1] 3 3 > <arne.mueller>> mclapply(grl, length, mc.cores=2) > <arne.mueller>[[1]] > <arne.mueller>[1] "Error in as.list.default(X) : \n no method for coercing > this S4 > <arne.mueller>class to a vector\n" > <arne.mueller> > <arne.mueller>[[2]] > <arne.mueller>[1] "Error in as.list.default(X) : \n no method for coercing > this S4 > <arne.mueller>class to a vector\n" > <arne.mueller> > <arne.mueller> > <arne.mueller>> sessionInfo() > <arne.mueller>R version 2.13.0 Under development (unstable) (2010-12-20 > r53870) > <arne.mueller>Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > <arne.mueller> > <arne.mueller>locale: > <arne.mueller>[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > <arne.mueller> > <arne.mueller>attached base packages: > <arne.mueller>[1] stats graphics grDevices utils datasets methods > base > <arne.mueller> > <arne.mueller>other attached packages: > <arne.mueller>[1] multicore_0.1-3 GenomicRanges_1.3.7 IRanges_1.9.17 > <arne.mueller> > <arne.mueller>loaded via a namespace (and not attached): > <arne.mueller>[1] tools_2.13.0 > <arne.mueller> > <arne.mueller> thanks a lot for hints .. > <arne.mueller> > <arne.mueller> arne > <arne.mueller> [[alternative HTML version deleted]] > <arne.mueller> > <arne.mueller>_______________________________________________ > <arne.mueller>Bioconductor mailing list > <arne.mueller>Bioconductor@r-project.org > <arne.mueller>https://stat.ethz.ch/mailman/listinfo/bioconductor > <arne.mueller>Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > Stefano Calza, PhD > Researcher/Assistent Professor - Biostatistician > > *Sezione di Statistica Medica e Biometria > Dipartimento di Scienze Biomediche e Biotecnologie > Università degli Studi di Brescia - Italy > Viale Europa, 11 25123 Brescia > > email: stefano.calza@med.unibs.it > stefano.calza@biostatistics.it > > pec: stefano.calza@pec.biostatistics.it > > Phone: +390303717653 > Fax: +390303717488 > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 6.9 years ago by Cory Barr60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 170 users visited in the last hour