Why is *ply-ing over a GRangesList much slower than *ply-ing over an IRangesList?
1
0
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States
On 08/24/2010 07:31 PM, Steve Lianoglou wrote: > Hi, > > Looping using any of the *ply (lapply, sapply, seqapply, etc.) seems > to be significantly slower when you are iterating over a GRangesList > vs. an IRangesList: > > R> library(GenomicFeatures) > R> txdb <- loadFeatures(system.file("extdata", "UCSC_knownGene_sample.sqlite", > package="GenomicFeatures")) > R> xcripts <- transcriptsBy(txdb, 'gene') > R> system.time(l1 <- sapply(xcripts, length)) > user system elapsed > 2.298 0.003 2.302 > > irl <- IRangesList(lapply(xcripts, ranges)) > system.time(l2 <- sapply(irl, length)) > user system elapsed > 0.047 0.001 0.049 As an update, Patrick has improved performance 10x-ish in IRanges 1.7.40, still some more to go... > replicate(5, system.time(lapply(xcripts, length))) [,1] [,2] [,3] [,4] [,5] user.self 0.31 0.317 0.318 0.313 0.328 sys.self 0.00 0.002 0.000 0.002 0.000 elapsed 0.31 0.325 0.319 0.317 0.329 user.child 0.00 0.000 0.000 0.000 0.000 sys.child 0.00 0.000 0.000 0.000 0.000 > irl <- IRangesList(lapply(xcripts, ranges)) > replicate(5, system.time(lapply(irl, length))) [,1] [,2] [,3] [,4] [,5] user.self 0.032 0.031 0.032 0.031 0.030 sys.self 0.000 0.000 0.000 0.001 0.001 elapsed 0.032 0.031 0.032 0.032 0.031 user.child 0.000 0.000 0.000 0.000 0.000 sys.child 0.000 0.000 0.000 0.000 0.000 Martin > > R> identical(l1, l2) > [1] TRUE > > I was curious if this is known/expected behavior and it's unavoidable, or .. ? > > Thanks, > -steve > > R> sessionInfo() > R version 2.12.0 Under development (unstable) (2010-08-21 r52791) > Platform: i386-apple-darwin10.4.0/i386 (32-bit) > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] org.Hs.eg.db_2.4.1 RSQLite_0.9-2 DBI_0.2-5 > AnnotationDbi_1.11.4 > [5] Biobase_2.9.0 GenomicFeatures_1.1.11 GenomicRanges_1.1.20 > IRanges_1.7.21 > > loaded via a namespace (and not attached): > [1] BSgenome_1.17.6 Biostrings_2.17.29 RCurl_1.4-3 XML_3.1-1 > biomaRt_2.5.1 > [6] rtracklayer_1.9.7 tools_2.12.0 > > -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
GO Cancer GO Cancer • 1.2k views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 22 months ago
United States
On Thu, Oct 14, 2010 at 5:55 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: <snip> > As an update, Patrick has improved performance 10x-ish in IRanges > 1.7.40, still some more to go... > >> replicate(5, system.time(lapply(xcripts, length))) > ? ? ? ? ? [,1] ?[,2] ?[,3] ?[,4] ?[,5] > user.self ?0.31 0.317 0.318 0.313 0.328 > sys.self ? 0.00 0.002 0.000 0.002 0.000 > elapsed ? ?0.31 0.325 0.319 0.317 0.329 > user.child 0.00 0.000 0.000 0.000 0.000 > sys.child ?0.00 0.000 0.000 0.000 0.000 > >> irl <- IRangesList(lapply(xcripts, ranges)) > >> replicate(5, system.time(lapply(irl, length))) > ? ? ? ? ? ?[,1] ?[,2] ?[,3] ?[,4] ?[,5] > user.self ?0.032 0.031 0.032 0.031 0.030 > sys.self ? 0.000 0.000 0.000 0.001 0.001 > elapsed ? ?0.032 0.031 0.032 0.032 0.031 > user.child 0.000 0.000 0.000 0.000 0.000 > sys.child ?0.000 0.000 0.000 0.000 0.000 Awesome! Thanks for dumping some brain power into this. Out of curiosity: I have several lists of serialized GRanges objects which I had to regenerate with the introduction of isCircular (or whatever it was) because of binary incompatibility with old/new versions of GRanges. Do these updates break any binary compatibility or anything? I'm not complaining, I just want to make sure I avoid updating until I can get "out of the woods" and find time to regenerate these things ;-). Thanks, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD COMMENT
0
Entering edit mode
On 10/14/2010 04:04 PM, Steve Lianoglou wrote: > On Thu, Oct 14, 2010 at 5:55 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > <snip> >> As an update, Patrick has improved performance 10x-ish in IRanges >> 1.7.40, still some more to go... >> >>> replicate(5, system.time(lapply(xcripts, length))) >> [,1] [,2] [,3] [,4] [,5] >> user.self 0.31 0.317 0.318 0.313 0.328 >> sys.self 0.00 0.002 0.000 0.002 0.000 >> elapsed 0.31 0.325 0.319 0.317 0.329 >> user.child 0.00 0.000 0.000 0.000 0.000 >> sys.child 0.00 0.000 0.000 0.000 0.000 >> >>> irl <- IRangesList(lapply(xcripts, ranges)) >> >>> replicate(5, system.time(lapply(irl, length))) >> [,1] [,2] [,3] [,4] [,5] >> user.self 0.032 0.031 0.032 0.031 0.030 >> sys.self 0.000 0.000 0.000 0.001 0.001 >> elapsed 0.032 0.031 0.032 0.032 0.031 >> user.child 0.000 0.000 0.000 0.000 0.000 >> sys.child 0.000 0.000 0.000 0.000 0.000 > > Awesome! > > Thanks for dumping some brain power into this. > > Out of curiosity: I have several lists of serialized GRanges objects > which I had to regenerate with the introduction of isCircular (or > whatever it was) because of binary incompatibility with old/new > versions of GRanges. > > Do these updates break any binary compatibility or anything? I'm not > complaining, I just want to make sure I avoid updating until I can get > "out of the woods" and find time to regenerate these things ;-). No, the speed-up did not involve changes in class structure. Have you tried updateObject on your objects? Martin > > Thanks, > -steve > -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD REPLY
0
Entering edit mode
On Thu, Oct 14, 2010 at 11:07 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: > On 10/14/2010 04:04 PM, Steve Lianoglou wrote: >> On Thu, Oct 14, 2010 at 5:55 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: >> <snip> >>> As an update, Patrick has improved performance 10x-ish in IRanges >>> 1.7.40, still some more to go... >>> >>>> replicate(5, system.time(lapply(xcripts, length))) >>> ? ? ? ? ? [,1] ?[,2] ?[,3] ?[,4] ?[,5] >>> user.self ?0.31 0.317 0.318 0.313 0.328 >>> sys.self ? 0.00 0.002 0.000 0.002 0.000 >>> elapsed ? ?0.31 0.325 0.319 0.317 0.329 >>> user.child 0.00 0.000 0.000 0.000 0.000 >>> sys.child ?0.00 0.000 0.000 0.000 0.000 >>> >>>> irl <- IRangesList(lapply(xcripts, ranges)) >>> >>>> replicate(5, system.time(lapply(irl, length))) >>> ? ? ? ? ? ?[,1] ?[,2] ?[,3] ?[,4] ?[,5] >>> user.self ?0.032 0.031 0.032 0.031 0.030 >>> sys.self ? 0.000 0.000 0.000 0.001 0.001 >>> elapsed ? ?0.032 0.031 0.032 0.032 0.031 >>> user.child 0.000 0.000 0.000 0.000 0.000 >>> sys.child ?0.000 0.000 0.000 0.000 0.000 >> >> Awesome! >> >> Thanks for dumping some brain power into this. >> >> Out of curiosity: I have several lists of serialized GRanges objects >> which I had to regenerate with the introduction of isCircular (or >> whatever it was) because of binary incompatibility with old/new >> versions of GRanges. >> >> Do these updates break any binary compatibility or anything? I'm not >> complaining, I just want to make sure I avoid updating until I can get >> "out of the woods" and find time to regenerate these things ;-). > > No, the speed-up did not involve changes in class structure. Nice. > Have you tried updateObject on your objects? No (I didn't even know it was there *blush*). It's not exactly clear to me how I would have done that, though. If I remember correctly R was failing inside the load() call, so I didn't have a chance to updateObject() anything ... does that make sense? Imagine I had a file called "genes.rda" which consisted of one object: a list of GRanges objects called `genes`. I thought I was getting an error right after load("genes.rda"). Can I suppress validity checks for a minute while a load "genes.rda", then `genes <- lapply(genes, updateObject)`, or something? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY
0
Entering edit mode
in my experience you can load the invalid object, just don't try to validate or evaluate it before updateObject is run. if you can't load it could be interesting to know why, so provide more details if you run into this. On Fri, Oct 15, 2010 at 12:10 AM, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: > On Thu, Oct 14, 2010 at 11:07 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: >> On 10/14/2010 04:04 PM, Steve Lianoglou wrote: >>> On Thu, Oct 14, 2010 at 5:55 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: >>> <snip> >>>> As an update, Patrick has improved performance 10x-ish in IRanges >>>> 1.7.40, still some more to go... >>>> >>>>> replicate(5, system.time(lapply(xcripts, length))) >>>> ? ? ? ? ? [,1] ?[,2] ?[,3] ?[,4] ?[,5] >>>> user.self ?0.31 0.317 0.318 0.313 0.328 >>>> sys.self ? 0.00 0.002 0.000 0.002 0.000 >>>> elapsed ? ?0.31 0.325 0.319 0.317 0.329 >>>> user.child 0.00 0.000 0.000 0.000 0.000 >>>> sys.child ?0.00 0.000 0.000 0.000 0.000 >>>> >>>>> irl <- IRangesList(lapply(xcripts, ranges)) >>>> >>>>> replicate(5, system.time(lapply(irl, length))) >>>> ? ? ? ? ? ?[,1] ?[,2] ?[,3] ?[,4] ?[,5] >>>> user.self ?0.032 0.031 0.032 0.031 0.030 >>>> sys.self ? 0.000 0.000 0.000 0.001 0.001 >>>> elapsed ? ?0.032 0.031 0.032 0.032 0.031 >>>> user.child 0.000 0.000 0.000 0.000 0.000 >>>> sys.child ?0.000 0.000 0.000 0.000 0.000 >>> >>> Awesome! >>> >>> Thanks for dumping some brain power into this. >>> >>> Out of curiosity: I have several lists of serialized GRanges objects >>> which I had to regenerate with the introduction of isCircular (or >>> whatever it was) because of binary incompatibility with old/new >>> versions of GRanges. >>> >>> Do these updates break any binary compatibility or anything? I'm not >>> complaining, I just want to make sure I avoid updating until I can get >>> "out of the woods" and find time to regenerate these things ;-). >> >> No, the speed-up did not involve changes in class structure. > > Nice. > >> Have you tried updateObject on your objects? > > No (I didn't even know it was there *blush*). > > It's not exactly clear to me how I would have done that, though. If I > remember correctly R was failing inside the load() call, so I didn't > have a chance to updateObject() anything ... does that make sense? > > Imagine I had a file called "genes.rda" which consisted of one object: > a list of GRanges objects called `genes`. > > I thought I was getting an error right after load("genes.rda"). Can I > suppress validity checks for a minute while a load "genes.rda", then > `genes <- lapply(genes, updateObject)`, or something? > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > ?| Memorial Sloan-Kettering Cancer Center > ?| Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
On Fri, Oct 15, 2010 at 12:29 AM, Vincent Carey <stvjc at="" channing.harvard.edu=""> wrote: > in my experience you can load the invalid object, just don't try to > validate or evaluate it before updateObject is run. > if you can't load it could be interesting to know why, so provide more > details if you run into this. No, you're right. I dug up an old such list-of-GRanges object and was able to essentially `updated <- lapply(old.list, updateObject)` it into shape. Sorry for the confusion. -steve > > On Fri, Oct 15, 2010 at 12:10 AM, Steve Lianoglou > <mailinglist.honeypot at="" gmail.com=""> wrote: >> On Thu, Oct 14, 2010 at 11:07 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: >>> On 10/14/2010 04:04 PM, Steve Lianoglou wrote: >>>> On Thu, Oct 14, 2010 at 5:55 PM, Martin Morgan <mtmorgan at="" fhcrc.org=""> wrote: >>>> <snip> >>>>> As an update, Patrick has improved performance 10x-ish in IRanges >>>>> 1.7.40, still some more to go... >>>>> >>>>>> replicate(5, system.time(lapply(xcripts, length))) >>>>> ? ? ? ? ? [,1] ?[,2] ?[,3] ?[,4] ?[,5] >>>>> user.self ?0.31 0.317 0.318 0.313 0.328 >>>>> sys.self ? 0.00 0.002 0.000 0.002 0.000 >>>>> elapsed ? ?0.31 0.325 0.319 0.317 0.329 >>>>> user.child 0.00 0.000 0.000 0.000 0.000 >>>>> sys.child ?0.00 0.000 0.000 0.000 0.000 >>>>> >>>>>> irl <- IRangesList(lapply(xcripts, ranges)) >>>>> >>>>>> replicate(5, system.time(lapply(irl, length))) >>>>> ? ? ? ? ? ?[,1] ?[,2] ?[,3] ?[,4] ?[,5] >>>>> user.self ?0.032 0.031 0.032 0.031 0.030 >>>>> sys.self ? 0.000 0.000 0.000 0.001 0.001 >>>>> elapsed ? ?0.032 0.031 0.032 0.032 0.031 >>>>> user.child 0.000 0.000 0.000 0.000 0.000 >>>>> sys.child ?0.000 0.000 0.000 0.000 0.000 >>>> >>>> Awesome! >>>> >>>> Thanks for dumping some brain power into this. >>>> >>>> Out of curiosity: I have several lists of serialized GRanges objects >>>> which I had to regenerate with the introduction of isCircular (or >>>> whatever it was) because of binary incompatibility with old/new >>>> versions of GRanges. >>>> >>>> Do these updates break any binary compatibility or anything? I'm not >>>> complaining, I just want to make sure I avoid updating until I can get >>>> "out of the woods" and find time to regenerate these things ;-). >>> >>> No, the speed-up did not involve changes in class structure. >> >> Nice. >> >>> Have you tried updateObject on your objects? >> >> No (I didn't even know it was there *blush*). >> >> It's not exactly clear to me how I would have done that, though. If I >> remember correctly R was failing inside the load() call, so I didn't >> have a chance to updateObject() anything ... does that make sense? >> >> Imagine I had a file called "genes.rda" which consisted of one object: >> a list of GRanges objects called `genes`. >> >> I thought I was getting an error right after load("genes.rda"). Can I >> suppress validity checks for a minute while a load "genes.rda", then >> `genes <- lapply(genes, updateObject)`, or something? >> >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> ?| Memorial Sloan-Kettering Cancer Center >> ?| Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLY

Login before adding your answer.

Traffic: 577 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6