bug in graph::edgeData()
1
0
Entering edit mode
Robert Castelo ★ 3.4k
@rcastelo
Last seen 2 days ago
Barcelona/Universitat Pompeu Fabra
hi, the function edgeData() from the Bioconductor graph package seems to have a problem with the way it stores and retrieves edge atributes. After investigating the issue i think it has to do with setting and retrieving edge attributes for a subset of the edges, as opposed to doing it for all edges at once. here is a minimal example that reproduces the problem: library(graph) df <- data.frame(from=c("a", "b", "c"), to=c("b", "c", "d"), weight=rep(1, 3), stringsAsFactors=FALSE) g <- graphBAM(df) ## this builds the undirected graph a-b-c-d ## set a new edge attribute "a" with 0 by default edgeDataDefaults(g, attr="a") <- 0 ## set the "a" attribute of all edges to 1 edgeData(g, from=df$from, to=df$to, attr="a") <- 1 ## show the value of the "a" attribute for all edges, ## everything works as expected edgeData(g, from=df$from, to=df$to, attr="a") $`a|b` [1] 1 $`b|c` [1] 1 $`c|d` [1] 1 ## now repeat the operation but setting the "a" attribute ## only for the first edge a-b g <- graphBAM(df) edgeDataDefaults(g, attr="a") <- 0 edgeData(g, from=df$from[1], to=df$to[1], attr="a") <- 1 edgeData(g, from=df$from, to=df$to, attr="a") $`a|b` [1] 0 $`b|c` [1] 1 $`c|d` [1] 0 as you see, the value 1 is not set for the first edge "a|b" but for the second "b|c". if i repeat the operation setting the edge attribute "a" for the last two edges, it goes also wrong: g <- graphBAM(df) edgeDataDefaults(g, attr="a") <- 0 edgeData(g, from=df$from[2:3], to=df$to[2:3], attr="a") <- 1 edgeData(g, from=df$from, to=df$to, attr="a") $`a|b` [1] 1 $`b|c` [1] 0 $`c|d` [1] 1 since the attribute is set to edges "a|b" and "c|d" while it should have been set to "b|c" instead of "a|b". i put my sessionInfo() below which correspond to the release version of the package but i can also reproduce it in the devel version. thanks! robert. R version 2.15.1 (2012-06-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] graph_1.36.2 vimcom_0.9-7 setwidth_1.0-3 colorout_0.9-9 loaded via a namespace (and not attached): [1] BiocGenerics_0.4.0 stats4_2.15.1 tools_2.15.1
graph graph • 1.2k views
ADD COMMENT
0
Entering edit mode
Paul Shannon ▴ 750
@paul-shannon-5161
Last seen 10.2 years ago
Hi Robert, Thanks for the bug report. I reproduced the problem and made some progress -- but not enough -- in unraveling the faulty logic which causes it. I offer here a workaround which might help you proceed with your work while I continue to work on the bug. The workaround depends upon converting your graphBAM to a graphAM, as demonstrated below. If your graph is very large, this may not be practical. In the code below, I reproduce the error, convert the graphBAM to a graphAM, then get the right result. I will continue working on the bug. Let us know if this workaround is helpful. - Paul library(graph) df <- data.frame(from=c("a", "b", "c"), to=c("b", "c", "d"), weight=rep(1, 3), stringsAsFactors=FALSE) g.orig <- graphBAM(df) g <- g.orig edgeDataDefaults(g, attr="a") <- 0 edgeData(g, from="a", to="b", attr="a") <- 1 edgeData(g, attr="a", from="a") # $`a|b` # [1] 1 edgeData(g, attr="a", from="a", to="b") # bug # $`a|b` # [1] 0 g <- as(g.orig, "graphAM") edgeDataDefaults(g, attr="a") <- 0 edgeData(g, from="a", to="b", attr="a") <- 1 edgeData(g, attr="a", from="a") # $`a|b` # [1] 1 edgeData(g, attr="a", from="a", to="b") # no bug # $`a|b` # [1] 1 On Feb 15, 2013, at 2:49 AM, Robert Castelo wrote: > hi, > > the function edgeData() from the Bioconductor graph package seems to have a problem with the way it stores and retrieves edge atributes. After investigating the issue i think it has to do with setting and retrieving edge attributes for a subset of the edges, as opposed to doing it for all edges at once. here is a minimal example that reproduces the problem: > > library(graph) > > df <- data.frame(from=c("a", "b", "c"), > to=c("b", "c", "d"), > weight=rep(1, 3), stringsAsFactors=FALSE) > g <- graphBAM(df) > ## this builds the undirected graph a-b-c-d > > ## set a new edge attribute "a" with 0 by default > edgeDataDefaults(g, attr="a") <- 0 > > ## set the "a" attribute of all edges to 1 > edgeData(g, from=df$from, to=df$to, attr="a") <- 1 > > ## show the value of the "a" attribute for all edges, > ## everything works as expected > edgeData(g, from=df$from, to=df$to, attr="a") > $`a|b` > [1] 1 > > $`b|c` > [1] 1 > > $`c|d` > [1] 1 > > ## now repeat the operation but setting the "a" attribute > ## only for the first edge a-b > > g <- graphBAM(df) > > edgeDataDefaults(g, attr="a") <- 0 > > edgeData(g, from=df$from[1], to=df$to[1], attr="a") <- 1 > > edgeData(g, from=df$from, to=df$to, attr="a") > $`a|b` > [1] 0 > > $`b|c` > [1] 1 > > $`c|d` > [1] 0 > > > as you see, the value 1 is not set for the first edge "a|b" but for the second "b|c". if i repeat the operation setting the edge attribute "a" for the last two edges, it goes also wrong: > > g <- graphBAM(df) > > edgeDataDefaults(g, attr="a") <- 0 > > edgeData(g, from=df$from[2:3], to=df$to[2:3], attr="a") <- 1 > > edgeData(g, from=df$from, to=df$to, attr="a") > $`a|b` > [1] 1 > > $`b|c` > [1] 0 > > $`c|d` > [1] 1 > > since the attribute is set to edges "a|b" and "c|d" while it should have been set to "b|c" instead of "a|b". > > i put my sessionInfo() below which correspond to the release version of the package but i can also reproduce it in the devel version. > > thanks! > robert. > > R version 2.15.1 (2012-06-22) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] graph_1.36.2 vimcom_0.9-7 setwidth_1.0-3 colorout_0.9-9 > > loaded via a namespace (and not attached): > [1] BiocGenerics_0.4.0 stats4_2.15.1 tools_2.15.1 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
hi Paul, thanks for the workaround, it will help me in the meantime, but i'm definitely more interested in the graphBAM class for the compact representation that it offers. i look forward to your news on this. cheers, robert. On 02/16/2013 01:31 AM, Paul Shannon wrote: > Hi Robert, > > Thanks for the bug report. I reproduced the problem and made some progress -- but not enough -- in unraveling the faulty logic which causes it. I offer here a workaround which might help you proceed with your work while I continue to work on the bug. > > The workaround depends upon converting your graphBAM to a graphAM, as demonstrated below. If your graph is very large, this may not be practical. > > In the code below, I reproduce the error, convert the graphBAM to a graphAM, then get the right result. > > I will continue working on the bug. Let us know if this workaround is helpful. > > - Paul > > > > library(graph) > df<- data.frame(from=c("a", "b", "c"), > to=c("b", "c", "d"), > weight=rep(1, 3), stringsAsFactors=FALSE) > g.orig<- graphBAM(df) > g<- g.orig > edgeDataDefaults(g, attr="a")<- 0 > edgeData(g, from="a", to="b", attr="a")<- 1 > edgeData(g, attr="a", from="a") > # $`a|b` > # [1] 1 > > edgeData(g, attr="a", from="a", to="b") # bug > # $`a|b` > # [1] 0 > > > g<- as(g.orig, "graphAM") > edgeDataDefaults(g, attr="a")<- 0 > edgeData(g, from="a", to="b", attr="a")<- 1 > edgeData(g, attr="a", from="a") > # $`a|b` > # [1] 1 > > edgeData(g, attr="a", from="a", to="b") # no bug > # $`a|b` > # [1] 1 > > > On Feb 15, 2013, at 2:49 AM, Robert Castelo wrote: > >> hi, >> >> the function edgeData() from the Bioconductor graph package seems to have a problem with the way it stores and retrieves edge atributes. After investigating the issue i think it has to do with setting and retrieving edge attributes for a subset of the edges, as opposed to doing it for all edges at once. here is a minimal example that reproduces the problem: >> >> library(graph) >> >> df<- data.frame(from=c("a", "b", "c"), >> to=c("b", "c", "d"), >> weight=rep(1, 3), stringsAsFactors=FALSE) >> g<- graphBAM(df) >> ## this builds the undirected graph a-b-c-d >> >> ## set a new edge attribute "a" with 0 by default >> edgeDataDefaults(g, attr="a")<- 0 >> >> ## set the "a" attribute of all edges to 1 >> edgeData(g, from=df$from, to=df$to, attr="a")<- 1 >> >> ## show the value of the "a" attribute for all edges, >> ## everything works as expected >> edgeData(g, from=df$from, to=df$to, attr="a") >> $`a|b` >> [1] 1 >> >> $`b|c` >> [1] 1 >> >> $`c|d` >> [1] 1 >> >> ## now repeat the operation but setting the "a" attribute >> ## only for the first edge a-b >> >> g<- graphBAM(df) >> >> edgeDataDefaults(g, attr="a")<- 0 >> >> edgeData(g, from=df$from[1], to=df$to[1], attr="a")<- 1 >> >> edgeData(g, from=df$from, to=df$to, attr="a") >> $`a|b` >> [1] 0 >> >> $`b|c` >> [1] 1 >> >> $`c|d` >> [1] 0 >> >> >> as you see, the value 1 is not set for the first edge "a|b" but for the second "b|c". if i repeat the operation setting the edge attribute "a" for the last two edges, it goes also wrong: >> >> g<- graphBAM(df) >> >> edgeDataDefaults(g, attr="a")<- 0 >> >> edgeData(g, from=df$from[2:3], to=df$to[2:3], attr="a")<- 1 >> >> edgeData(g, from=df$from, to=df$to, attr="a") >> $`a|b` >> [1] 1 >> >> $`b|c` >> [1] 0 >> >> $`c|d` >> [1] 1 >> >> since the attribute is set to edges "a|b" and "c|d" while it should have been set to "b|c" instead of "a|b". >> >> i put my sessionInfo() below which correspond to the release version of the package but i can also reproduce it in the devel version. >> >> thanks! >> robert. >> >> R version 2.15.1 (2012-06-22) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] graph_1.36.2 vimcom_0.9-7 setwidth_1.0-3 colorout_0.9-9 >> >> loaded via a namespace (and not attached): >> [1] BiocGenerics_0.4.0 stats4_2.15.1 tools_2.15.1 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Robert Castelo, PhD Associate Professor Dept. of Experimental and Health Sciences Universitat Pompeu Fabra (UPF) Barcelona Biomedical Research Park (PRBB) Dr Aiguader 88 E-08003 Barcelona, Spain telf: +34.933.160.514 fax: +34.933.160.550
ADD REPLY
0
Entering edit mode
Dear Paul, coming back to your temporary solution to this bug, please see below ... On 02/16/2013 01:31 AM, Paul Shannon wrote: > Hi Robert, > > Thanks for the bug report. I reproduced the problem and made some progress -- but not enough -- in unraveling the faulty logic which causes it. I offer here a workaround which might help you proceed with your work while I continue to work on the bug. > > The workaround depends upon converting your graphBAM to a graphAM, as demonstrated below. If your graph is very large, this may not be practical. > > In the code below, I reproduce the error, convert the graphBAM to a graphAM, then get the right result. > > I will continue working on the bug. Let us know if this workaround is helpful. > > - Paul > > > > library(graph) > df<- data.frame(from=c("a", "b", "c"), > to=c("b", "c", "d"), > weight=rep(1, 3), stringsAsFactors=FALSE) > g.orig<- graphBAM(df) > g<- g.orig > edgeDataDefaults(g, attr="a")<- 0 > edgeData(g, from="a", to="b", attr="a")<- 1 > edgeData(g, attr="a", from="a") > # $`a|b` > # [1] 1 > > edgeData(g, attr="a", from="a", to="b") # bug > # $`a|b` > # [1] 0 > > > g<- as(g.orig, "graphAM") > edgeDataDefaults(g, attr="a")<- 0 > edgeData(g, from="a", to="b", attr="a")<- 1 > edgeData(g, attr="a", from="a") > # $`a|b` > # [1] 1 > > edgeData(g, attr="a", from="a", to="b") # no bug > # $`a|b` > # [1] 1 to facilitate using this workaround in my code, it would be useful for me to be able to coerce the graphAM object back to a gaphBAM object with all attributes, however it seems that this coercion is not copying the attributes: gbam <- as(g, "graphBAM") gbam A graphBAM graph with undirected edges Number of Nodes = 4 Number of Edges = 3 edgeData(gbam, attr="a", from="a", to="b") # no bug Error in .verifyBAMAttrs(self, attr) : 'attr' not found: ?a? could you fix the coercion from graphAM to graphBAM such that all attributes from the graphAM object are copied into the graphBAM object? thanks!! robert. ps: sessionInfo() R Under development (unstable) (2012-10-07 r60893) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] graph_1.37.6 vimcom_0.9-7 setwidth_1.0-3 colorout_1.0-0 loaded via a namespace (and not attached): [1] BiocGenerics_0.5.6 parallel_2.16.0 stats4_2.16.0 tools_2.16.0 > > On Feb 15, 2013, at 2:49 AM, Robert Castelo wrote: > >> hi, >> >> the function edgeData() from the Bioconductor graph package seems to have a problem with the way it stores and retrieves edge atributes. After investigating the issue i think it has to do with setting and retrieving edge attributes for a subset of the edges, as opposed to doing it for all edges at once. here is a minimal example that reproduces the problem: >> >> library(graph) >> >> df<- data.frame(from=c("a", "b", "c"), >> to=c("b", "c", "d"), >> weight=rep(1, 3), stringsAsFactors=FALSE) >> g<- graphBAM(df) >> ## this builds the undirected graph a-b-c-d >> >> ## set a new edge attribute "a" with 0 by default >> edgeDataDefaults(g, attr="a")<- 0 >> >> ## set the "a" attribute of all edges to 1 >> edgeData(g, from=df$from, to=df$to, attr="a")<- 1 >> >> ## show the value of the "a" attribute for all edges, >> ## everything works as expected >> edgeData(g, from=df$from, to=df$to, attr="a") >> $`a|b` >> [1] 1 >> >> $`b|c` >> [1] 1 >> >> $`c|d` >> [1] 1 >> >> ## now repeat the operation but setting the "a" attribute >> ## only for the first edge a-b >> >> g<- graphBAM(df) >> >> edgeDataDefaults(g, attr="a")<- 0 >> >> edgeData(g, from=df$from[1], to=df$to[1], attr="a")<- 1 >> >> edgeData(g, from=df$from, to=df$to, attr="a") >> $`a|b` >> [1] 0 >> >> $`b|c` >> [1] 1 >> >> $`c|d` >> [1] 0 >> >> >> as you see, the value 1 is not set for the first edge "a|b" but for the second "b|c". if i repeat the operation setting the edge attribute "a" for the last two edges, it goes also wrong: >> >> g<- graphBAM(df) >> >> edgeDataDefaults(g, attr="a")<- 0 >> >> edgeData(g, from=df$from[2:3], to=df$to[2:3], attr="a")<- 1 >> >> edgeData(g, from=df$from, to=df$to, attr="a") >> $`a|b` >> [1] 1 >> >> $`b|c` >> [1] 0 >> >> $`c|d` >> [1] 1 >> >> since the attribute is set to edges "a|b" and "c|d" while it should have been set to "b|c" instead of "a|b". >> >> i put my sessionInfo() below which correspond to the release version of the package but i can also reproduce it in the devel version. >> >> thanks! >> robert. >> >> R version 2.15.1 (2012-06-22) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] graph_1.36.2 vimcom_0.9-7 setwidth_1.0-3 colorout_0.9-9 >> >> loaded via a namespace (and not attached): >> [1] BiocGenerics_0.4.0 stats4_2.15.1 tools_2.15.1 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- Robert Castelo, PhD Associate Professor Dept. of Experimental and Health Sciences Universitat Pompeu Fabra (UPF) Barcelona Biomedical Research Park (PRBB) Dr Aiguader 88 E-08003 Barcelona, Spain telf: +34.933.160.514 fax: +34.933.160.550
ADD REPLY

Login before adding your answer.

Traffic: 761 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6