multicore and GRangesList
3
0
Entering edit mode
@arnemuellernovartiscom-2205
Last seen 9.1 years ago
Switzerland
Dear All, Has anybody experience using the multicore package with GRangesLists from the GenomicRanges package? I can't get it working ..., here's an example: > a = GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) > b = GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) > grl = GRangesList(a, b) > sapply(grl, length) [1] 3 3 > mclapply(grl, length, mc.cores=2) [[1]] [1] "Error in as.list.default(X) : \n no method for coercing this S4 class to a vector\n" [[2]] [1] "Error in as.list.default(X) : \n no method for coercing this S4 class to a vector\n" > sessionInfo() R version 2.13.0 Under development (unstable) (2010-12-20 r53870) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] multicore_0.1-3 GenomicRanges_1.3.7 IRanges_1.9.17 loaded via a namespace (and not attached): [1] tools_2.13.0 thanks a lot for hints .. arne [[alternative HTML version deleted]]
GenomicRanges GenomicRanges • 2.1k views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 21 months ago
United States
Hi, On Tue, Jan 11, 2011 at 9:34 AM, <arne.mueller at="" novartis.com=""> wrote: > Dear All, > > Has anybody experience using the multicore package with GRangesLists from > the GenomicRanges package? I can't get it working ..., here's an example: > >> a = GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) >> b = GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) >> grl = GRangesList(a, b) >> sapply(grl, length) > [1] 3 3 >> mclapply(grl, length, mc.cores=2) > [[1]] > [1] "Error in as.list.default(X) : \n ?no method for coercing this S4 > class to a vector\n" This works with the foreach and doMC (which uses the multicore package) combo, if you're interested: R> library(GenomicRanges) R> library(doMC) R> a <- GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) R> b <- GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) R> grl = GRangesList(a, b) R> registerDoMC(2) R> foreach(g=grl, .packages='GenomicRanges') %dopar% length(g) [[1]] [1] 3 [[2]] [1] 3 R> sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] doMC_1.2.1 multicore_0.1-3 foreach_1.3.0 [4] codetools_0.2-6 iterators_1.0.3 GenomicRanges_1.2.2 [7] IRanges_1.8.8 -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD COMMENT
0
Entering edit mode
Hello, thanks, all for your reply. The doMC works fine for me. Nevertheless, there seems to be some overhead since at the beginning and the end of the loop. The initial R-process doesn't fork immediately and when all child processes are finished the parent still processes for quite some time before it returns the result. regards, arne Steve Lianoglou <mailinglist.honeypot@gmail.com> 01/11/2011 04:49 PM To arne.mueller@novartis.com cc bioconductor@stat.math.ethz.ch Subject Re: [BioC] multicore and GRangesList Hi, On Tue, Jan 11, 2011 at 9:34 AM, <arne.mueller@novartis.com> wrote: > Dear All, > > Has anybody experience using the multicore package with GRangesLists from > the GenomicRanges package? I can't get it working ..., here's an example: > >> a = GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) >> b = GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) >> grl = GRangesList(a, b) >> sapply(grl, length) > [1] 3 3 >> mclapply(grl, length, mc.cores=2) > [[1]] > [1] "Error in as.list.default(X) : \n no method for coercing this S4 > class to a vector\n" This works with the foreach and doMC (which uses the multicore package) combo, if you're interested: R> library(GenomicRanges) R> library(doMC) R> a <- GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) R> b <- GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) R> grl = GRangesList(a, b) R> registerDoMC(2) R> foreach(g=grl, .packages='GenomicRanges') %dopar% length(g) [[1]] [1] 3 [[2]] [1] 3 R> sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] doMC_1.2.1 multicore_0.1-3 foreach_1.3.0 [4] codetools_0.2-6 iterators_1.0.3 GenomicRanges_1.2.2 [7] IRanges_1.8.8 -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On Tue, Jan 11, 2011 at 10:19 AM, <arne.mueller@novartis.com> wrote: > Hello, > > thanks, all for your reply. The doMC works fine for me. Nevertheless, > there seems to be some overhead since at the beginning and the end of the > loop. The initial R-process doesn't fork immediately and when all child > processes are finished the parent still processes for quite some time > before it returns the result. > > Right, there's going to be a lot of overhead in the subsetting etc as Cory mentioned. The operation needs to be fairly long running for it to be worth the overhead of splitting things up. regards, > > arne > > > > > Steve Lianoglou <mailinglist.honeypot@gmail.com> > 01/11/2011 04:49 PM > > To > arne.mueller@novartis.com > cc > bioconductor@stat.math.ethz.ch > Subject > Re: [BioC] multicore and GRangesList > > > > > > > Hi, > > On Tue, Jan 11, 2011 at 9:34 AM, <arne.mueller@novartis.com> wrote: > > Dear All, > > > > Has anybody experience using the multicore package with GRangesLists > from > > the GenomicRanges package? I can't get it working ..., here's an > example: > > > >> a = GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) > >> b = GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) > >> grl = GRangesList(a, b) > >> sapply(grl, length) > > [1] 3 3 > >> mclapply(grl, length, mc.cores=2) > > [[1]] > > [1] "Error in as.list.default(X) : \n no method for coercing this S4 > > class to a vector\n" > > This works with the foreach and doMC (which uses the multicore > package) combo, if you're interested: > > R> library(GenomicRanges) > R> library(doMC) > R> a <- GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) > R> b <- GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) > R> grl = GRangesList(a, b) > R> registerDoMC(2) > R> foreach(g=grl, .packages='GenomicRanges') %dopar% length(g) > [[1]] > [1] 3 > > [[2]] > [1] 3 > > R> sessionInfo() > R version 2.12.1 (2010-12-16) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] doMC_1.2.1 multicore_0.1-3 foreach_1.3.0 > [4] codetools_0.2-6 iterators_1.0.3 GenomicRanges_1.2.2 > [7] IRanges_1.8.8 > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio.mskc="" c.org="" %7elianos="" contact=""> > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi, On Tue, Jan 11, 2011 at 1:19 PM, <arne.mueller@novartis.com> wrote: > Hello, > > thanks, all for your reply. The doMC works fine for me. Nevertheless, > there seems to be some overhead since at the beginning and the end of the > loop. The initial R-process doesn't fork immediately and when all child > processes are finished the parent still processes for quite some time before > it returns the result. > Such is the price you pay for "easy" parallelization. If it takes you longer to split your jobs + reduce/post process the result than it does to just run the job linearly, then you might as well use good ol' lapply. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States
On 01/11/2011 06:34 AM, arne.mueller at novartis.com wrote: > Dear All, > > Has anybody experience using the multicore package with GRangesLists from > the GenomicRanges package? I can't get it working ..., here's an example: > >> a = GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) >> b = GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) >> grl = GRangesList(a, b) >> sapply(grl, length) > [1] 3 3 >> mclapply(grl, length, mc.cores=2) > [[1]] > [1] "Error in as.list.default(X) : \n no method for coercing this S4 > class to a vector\n" A hack is assignInNamespace("lapply", lapply, "base") and then idx <- seq_len(1000) res3 <- mclapply(tx[idx], length) this is about 6x faster than Cory's mclapply(idx, function(i, grl) length(grl[[i]]), tx[idx]) because lapply,GRangesList is being more efficient at extracting ranges than [[ (maybe less validity checking?). I think this should scale with the number of cores, but for whatever reason all my processes stay on the same cpu. Martin > > [[2]] > [1] "Error in as.list.default(X) : \n no method for coercing this S4 > class to a vector\n" > > >> sessionInfo() > R version 2.13.0 Under development (unstable) (2010-12-20 r53870) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] multicore_0.1-3 GenomicRanges_1.3.7 IRanges_1.9.17 > > loaded via a namespace (and not attached): > [1] tools_2.13.0 > > thanks a lot for hints .. > > arne > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
@stefano-calza-4428
Last seen 10.2 years ago
Hi, you cannot do that. lapply, sapply, ecc. methods for GRangesLists are specific, not standard ones. I've been using it but afaik requires some hacking (unless you coerce the object to a list, which took long time for me...and didn't like). maybe someone has better suggestion Stefano On Tue, Jan 11, 2011 at 03:34:46PM +0100, arne.mueller at novartis.com wrote: <arne.mueller>Dear All, <arne.mueller> <arne.mueller>Has anybody experience using the multicore package with GRangesLists from <arne.mueller>the GenomicRanges package? I can't get it working ..., here's an example: <arne.mueller> <arne.mueller>> a = GRanges(seqnames="A", ranges=IRanges(start=1:3, width=5)) <arne.mueller>> b = GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), width=5)) <arne.mueller>> grl = GRangesList(a, b) <arne.mueller>> sapply(grl, length) <arne.mueller>[1] 3 3 <arne.mueller>> mclapply(grl, length, mc.cores=2) <arne.mueller>[[1]] <arne.mueller>[1] "Error in as.list.default(X) : \n no method for coercing this S4 <arne.mueller>class to a vector\n" <arne.mueller> <arne.mueller>[[2]] <arne.mueller>[1] "Error in as.list.default(X) : \n no method for coercing this S4 <arne.mueller>class to a vector\n" <arne.mueller> <arne.mueller> <arne.mueller>> sessionInfo() <arne.mueller>R version 2.13.0 Under development (unstable) (2010-12-20 r53870) <arne.mueller>Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) <arne.mueller> <arne.mueller>locale: <arne.mueller>[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 <arne.mueller> <arne.mueller>attached base packages: <arne.mueller>[1] stats graphics grDevices utils datasets methods base <arne.mueller> <arne.mueller>other attached packages: <arne.mueller>[1] multicore_0.1-3 GenomicRanges_1.3.7 IRanges_1.9.17 <arne.mueller> <arne.mueller>loaded via a namespace (and not attached): <arne.mueller>[1] tools_2.13.0 <arne.mueller> <arne.mueller> thanks a lot for hints .. <arne.mueller> <arne.mueller> arne <arne.mueller> [[alternative HTML version deleted]] <arne.mueller> <arne.mueller>_______________________________________________ <arne.mueller>Bioconductor mailing list <arne.mueller>Bioconductor at r-project.org <arne.mueller>https://stat.ethz.ch/mailman/listinfo/bioconductor <arne.mueller>Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Stefano Calza, PhD Researcher/Assistent Professor - Biostatistician *Sezione di Statistica Medica e Biometria Dipartimento di Scienze Biomediche e Biotecnologie Universit? degli Studi di Brescia - Italy Viale Europa, 11 25123 Brescia email: stefano.calza at med.unibs.it stefano.calza at biostatistics.it pec: stefano.calza at pec.biostatistics.it Phone: +390303717653 Fax: +390303717488
ADD COMMENT
0
Entering edit mode
I think largely to get around this issue, some methods were written to handle common cases where lapplying on a GRangesList might be most intuitive. For example, this should work quickly on your provided example: elementLengths(grl) In other situations, you can lapply over the indices or names of the GRangesList. However, the subsetting can lead to overhead larger than parallelization benefits. -Cory On Tue, Jan 11, 2011 at 6:52 AM, Stefano Calza <stefano.calza@med.unibs.it>wrote: > Hi, > > you cannot do that. lapply, sapply, ecc. methods for GRangesLists are > specific, not standard ones. > > I've been using it but afaik requires some hacking (unless you coerce the > object to a list, which took long time for me...and didn't like). > > maybe someone has better suggestion > > Stefano > > On Tue, Jan 11, 2011 at 03:34:46PM +0100, arne.mueller@novartis.com wrote: > <arne.mueller>Dear All, > <arne.mueller> > <arne.mueller>Has anybody experience using the multicore package with > GRangesLists from > <arne.mueller>the GenomicRanges package? I can't get it working ..., here's > an example: > <arne.mueller> > <arne.mueller>> a = GRanges(seqnames="A", ranges=IRanges(start=1:3, > width=5)) > <arne.mueller>> b = GRanges(seqnames="A", ranges=IRanges(start=c(10,20,30), > width=5)) > <arne.mueller>> grl = GRangesList(a, b) > <arne.mueller>> sapply(grl, length) > <arne.mueller>[1] 3 3 > <arne.mueller>> mclapply(grl, length, mc.cores=2) > <arne.mueller>[[1]] > <arne.mueller>[1] "Error in as.list.default(X) : \n no method for coercing > this S4 > <arne.mueller>class to a vector\n" > <arne.mueller> > <arne.mueller>[[2]] > <arne.mueller>[1] "Error in as.list.default(X) : \n no method for coercing > this S4 > <arne.mueller>class to a vector\n" > <arne.mueller> > <arne.mueller> > <arne.mueller>> sessionInfo() > <arne.mueller>R version 2.13.0 Under development (unstable) (2010-12-20 > r53870) > <arne.mueller>Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > <arne.mueller> > <arne.mueller>locale: > <arne.mueller>[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > <arne.mueller> > <arne.mueller>attached base packages: > <arne.mueller>[1] stats graphics grDevices utils datasets methods > base > <arne.mueller> > <arne.mueller>other attached packages: > <arne.mueller>[1] multicore_0.1-3 GenomicRanges_1.3.7 IRanges_1.9.17 > <arne.mueller> > <arne.mueller>loaded via a namespace (and not attached): > <arne.mueller>[1] tools_2.13.0 > <arne.mueller> > <arne.mueller> thanks a lot for hints .. > <arne.mueller> > <arne.mueller> arne > <arne.mueller> [[alternative HTML version deleted]] > <arne.mueller> > <arne.mueller>_______________________________________________ > <arne.mueller>Bioconductor mailing list > <arne.mueller>Bioconductor@r-project.org > <arne.mueller>https://stat.ethz.ch/mailman/listinfo/bioconductor > <arne.mueller>Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- > Stefano Calza, PhD > Researcher/Assistent Professor - Biostatistician > > *Sezione di Statistica Medica e Biometria > Dipartimento di Scienze Biomediche e Biotecnologie > Università degli Studi di Brescia - Italy > Viale Europa, 11 25123 Brescia > > email: stefano.calza@med.unibs.it > stefano.calza@biostatistics.it > > pec: stefano.calza@pec.biostatistics.it > > Phone: +390303717653 > Fax: +390303717488 > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 854 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6