slow graph methods

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 11 months ago

United States

Hi Paul -- I think Seth's speed-up came when he realized that the entire graph gets copied when a single node or edge is added. So his solution was to accumulate a bunch of changes (actually, the whole XML file) and then apply them in a single call to, e.g., addEdge using the fact that the arguments from, to can be vectors. Seth accumulated nodes / edges to be added in an environment, which does not get copied (the way a list would) each time it is changed. Maybe that perspective helps you to revise your code to add nodes in batches, rather than individually? Martin Paul Shannon <pshannon at="" systemsbiology.org=""> writes: > A couple of weeks ago, just before he left, Seth made a striking (800x) > improvement to the speed of graph::fromGXL. This has been a big help. > > However, I'd like to construct graphs incrementally, and > programmatically, > without having to create new gxl files. Is it possible that > Seth's efficiencies have not yet been applied to other methods in the > graph class? > > In particular, it seems that the following operations take a longish > time > on graphs of a few thousand nodes: > > edgeData > nodeData > addEdge > addNode > > I could provide actual times and test cases if that would help. > > Thanks! > > - Paul > > > > On Sep 12, 2007, at 8:09 AM, Seth Falcon wrote: > >> Hi again, >> >> Seth Falcon <sfalcon at="" fhcrc.org=""> writes: >>> Paul Shannon <pshannon at="" systemsbiology.org=""> writes: >>>> It took nearly 24 hours (!) to create a 16k node graph using two >>>> different techniques: >>>> >>>> g = fromGXL (file ('someFile.gxl')) >> >> Using a patched version of fromGXL (on my laptop) I get: >> >>> library(graph) >>> con = file("kegg.yeast.gxl", open="r") >>> system.time(z <- fromGXL(con)) >> user system elapsed >> 104.366 0.570 105.070 >>> z >> A graphNEL graph with undirected edges >> Number of Nodes = 15158 >> Number of Edges = 32668 >>> validObject(z) >> [1] TRUE >> >> That's over 800x faster :-) >> >> I've checked in the changes in devel as graph 1.15.17. You can get it >> from svn or wait till this time tomorrow (in either case, you will >> need R 2.6.0 alpha). >> >> The code passes the existing unit tests, but some extra scrutiny that >> the returned graphs are as desired is quite welcome. >> >> + seth >> >> -- >> Seth Falcon | Computational Biology | Fred Hutchinson Cancer >> Research Center >> BioC: http://bioconductor.org/ >> Blog: http://userprimary.net/user/ > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Bioconductor / Computational Biology http://bioconductor.org

SystemsBiology graph SystemsBiology graph • 1.5k views

ADD COMMENT • link updated 18.3 years ago by Seth Falcon ★ 7.4k • written 18.3 years ago by Martin Morgan 25k

0

Entering edit mode

Paul Shannon ★ 1.1k

@paul-shannon-578

Last seen 11.3 years ago

Hi Martin, Great idea. I'll try it. Thanks! - Paul On Sep 26, 2007, at 2:16 PM, Martin Morgan wrote: > Seth accumulated nodes / edges to > be added in an environment, which does not get copied (the way a list > would) each time it is changed. Maybe that perspective helps you to > revise your code to add nodes in batches, rather than individually?

ADD COMMENT • link 18.3 years ago Paul Shannon ★ 1.1k

0

Entering edit mode

Seth Falcon ★ 7.4k

@seth-falcon-992

Last seen 11.3 years ago

Paul Shannon <pshannon at="" systemsbiology.org=""> writes: > Hi Martin, > > Great idea. I'll try it. > > Thanks! > > - Paul > > On Sep 26, 2007, at 2:16 PM, Martin Morgan wrote: > >> Seth accumulated nodes / edges to >> be added in an environment, which does not get copied (the way a list >> would) each time it is changed. Maybe that perspective helps you to >> revise your code to add nodes in batches, rather than individually? The code that I added for reading GXL files is in graph/R/gxlReader.R in the graph_handler function. You will have to "read through" the XML event parsing, but perhaps it will give you a more concrete example of using environments to accumulate data needed for one or two big calls to the graph functions and avoiding the incremental calls which are slow due to copying (as others have explained in this thread). Best, + seth -- Seth Falcon | seth at userprimary.net | blog: http://userprimary.net/user/

ADD COMMENT • link 18.3 years ago Seth Falcon ★ 7.4k

Login before adding your answer.