RpsiXML issues with latest Biogrid release files
2
0
Entering edit mode
@sara-jc-gosline-3831
Last seen 9.6 years ago
Hello again, I have recently installed and used RpsiXML to successfully parse the latest xml files from intact. However, when I try the same functions with the latest version of Biogrid (to obtain assay-specific interactions instead of experiment-specific), I get a graph with a single node ?NA? and 1 interaction. SessionInfo is at the end of the email. ***Parsing xml files to graph: I used the ?PCA? file since it is relatively short: > g<-psimi25XML2Graph('../biogrid/psiml25/BIOGRID-SYSTEM- PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,type='interaction',verbose=T) 1 Entries found Parsing entry 1 Parsing experiments: ............................................... Parsing interactors: 100% ========================================> Parsing interactions: 100% ========================================> > g [1] "psimi25Graph" attr(,"package") [1] "RpsiXML" > nodes(g) [1] "NA" > edges(g) $`NA` [1] "NA" ***Parsing xml file without graph: To determine if this is something wrong with the parsing, I redo the parsing without formatting to a graph object: > g<-parsePsimi25Interaction('../biogrid/psiml25/BIOGRID-SYSTEM- PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,verbose=T) Here is the first bit of output: > g ================================== interaction entry ( 2009-11-25 ): ================================== [ organism ]: Arabidopsis thaliana Saccharomyces cerevisiae Schizosaccharomyces pombe [ taxonomy ID ]: 3702 4932 4896 [ interactors ]: there are 1214 interactors in total, here are the first few ones: sourceDb sourceId shortLabel uniprotId organismName taxId <na> "" "1" "BZR1" NA "Arabidopsis thaliana" "3702" <na> "" "2" "GRF6" NA "Arabidopsis thaliana" "3702" <na> "" "3" "FUN14" NA "Saccharomyces cerevisiae" "4932" <na> "" "4" "UIP4" NA "Saccharomyces cerevisiae" "4932" <na> "" "5" "ALO1" NA "Saccharomyces cerevisiae" "4932" <na> "" "6" "SPO7" NA "Saccharomyces cerevisiae" "4932" ... [ interactions ]: there are 2736 interactions in total, here are the first few ones: [[1]] interaction ( NA ): --------------------------------- [ source database ]: [ source experiment ID ]: 1 [ interaction type ]: protein complementation assay [ experiment ]: pubmed 17681130 [ participant ]: NA NA [ bait ]: 1 [ bait UniProt ]: NA [ prey ]: 2 [ prey UniProt ]: NA So the interactors and interactions are being parsed correctly, but not being retrieved properly. When I look at the attributes of each interaction I get mostly NA?s: attributes(g at interactions[[1]]) $sourceDb [1] "" $sourceId [1] NA $interactionType [1] "protein complementation assay" $expPubMed [1] "17681130" $expSourceId [1] "1" $confidenceValue [1] NA $participant <na> <na> NA NA $bait [1] "1" $baitUniProt [1] NA $prey [1] "2" $preyUniProt [1] NA $inhibitor [1] NA $neutralComponent [1] NA $class [1] "psimi25Interaction" attr(,"package") [1] "RpsiXML" ***Conclusion: Is there an easy workaround for this? Maybe where I can manually look up identifiers? Thanks, sara ***SessionInfo: > sessionInfo() R version 2.8.1 (2008-12-22) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] grid splines tools stats graphics grDevices utils [8] datasets methods base other attached packages: [1] gtools_2.5.0-1 multicore_0.1-3 ppiStats_1.8.0 [4] RColorBrewer_1.0-2 lattice_0.17-17 ScISI_1.14.0 [7] apComplex_2.8.0 ppiData_0.1.13 Rgraphviz_1.20.4 [10] org.Sc.sgd.db_2.2.6 GOstats_2.8.0 Category_2.8.4 [13] genefilter_1.22.0 survival_2.34-1 GO.db_2.2.5 [16] RSQLite_0.7-1 DBI_0.2-4 RpsiXML_1.0.0 [19] RBGL_1.20.0 hypergraph_1.14.0 graph_1.20.0 [22] XML_2.3-0 annotate_1.20.1 xtable_1.5-6 [25] AnnotationDbi_1.4.3 Biobase_2.2.2 loaded via a namespace (and not attached): [1] cluster_1.11.11 GSEABase_1.4.0
GO Organism Arabidopsis thaliana graph RpsiXML GO Organism Arabidopsis thaliana graph • 1.4k views
ADD COMMENT
0
Entering edit mode
Tony Chiang ▴ 570
@tony-chiang-1769
Last seen 9.6 years ago
Hi Sara, The current release of R is 2.10. I don't know if this will fix the problem, but the current versions of the packages are built for the latest release of R, so the first thing to try is updating your R which will update the version of RpsiXML. I will look at your example in a bit. Tony On Mon, Dec 7, 2009 at 7:03 AM, Sara JC Gosline <sara.gosline@mail.mcgill.ca> wrote: > Hello again, > > I have recently installed and used RpsiXML to successfully parse the latest > xml files from intact. However, when I try the same functions with the > latest version of Biogrid (to obtain assay-specific interactions instead of > experiment-specific), I get a graph with a single node “NA” and 1 > interaction. SessionInfo is at the end of the email. > > ***Parsing xml files to graph: > I used the ‘PCA’ file since it is relatively short: > >> >> g<-psimi25XML2Graph('../biogrid/psiml25/BIOGRID-SYSTEM- PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,type='interaction',verbose=T) > 1 Entries found > Parsing entry 1 > Parsing experiments: ............................................... > Parsing interactors: > 100% ========================================> > Parsing interactions: > 100% ========================================> > >> g >> > [1] "psimi25Graph" > attr(,"package") > [1] "RpsiXML" > >> nodes(g) >> > [1] "NA" > >> edges(g) >> > $`NA` > [1] "NA" > > ***Parsing xml file without graph: > To determine if this is something wrong with the parsing, I redo the > parsing without formatting to a graph object: > >> >> g<-parsePsimi25Interaction('../biogrid/psiml25/BIOGRID-SYSTEM- PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,verbose=T) > > Here is the first bit of output: > >> g >> > ================================== > interaction entry ( 2009-11-25 ): > ================================== > [ organism ]: Arabidopsis thaliana Saccharomyces cerevisiae > Schizosaccharomyces pombe > [ taxonomy ID ]: 3702 4932 4896 > [ interactors ]: there are 1214 interactors in total, here are the first > few ones: > sourceDb sourceId shortLabel uniprotId organismName taxId > <na> "" "1" "BZR1" NA "Arabidopsis thaliana" "3702" > <na> "" "2" "GRF6" NA "Arabidopsis thaliana" "3702" > <na> "" "3" "FUN14" NA "Saccharomyces cerevisiae" "4932" > <na> "" "4" "UIP4" NA "Saccharomyces cerevisiae" "4932" > <na> "" "5" "ALO1" NA "Saccharomyces cerevisiae" "4932" > <na> "" "6" "SPO7" NA "Saccharomyces cerevisiae" "4932" > ... > [ interactions ]: there are 2736 interactions in total, here are the first > few ones: > [[1]] > interaction ( NA ): > --------------------------------- > [ source database ]: > [ source experiment ID ]: 1 > [ interaction type ]: protein complementation assay > [ experiment ]: pubmed 17681130 > [ participant ]: NA NA > [ bait ]: 1 > [ bait UniProt ]: NA > [ prey ]: 2 > [ prey UniProt ]: NA > > So the interactors and interactions are being parsed correctly, but not > being retrieved properly. When I look at the attributes of each interaction > I get mostly NA’s: > attributes(g@interactions[[1]]) > $sourceDb > [1] "" > > $sourceId > [1] NA > > $interactionType > [1] "protein complementation assay" > > $expPubMed > [1] "17681130" > > $expSourceId > [1] "1" > > $confidenceValue > [1] NA > > $participant > <na> <na> > NA NA > > $bait > [1] "1" > > $baitUniProt > [1] NA > > $prey > [1] "2" > > $preyUniProt > [1] NA > > $inhibitor > [1] NA > > $neutralComponent > [1] NA > > $class > [1] "psimi25Interaction" > attr(,"package") > [1] "RpsiXML" > > > > ***Conclusion: > Is there an easy workaround for this? Maybe where I can manually look up > identifiers? > > Thanks, > sara > > > ***SessionInfo: > > sessionInfo() >> > R version 2.8.1 (2008-12-22) > x86_64-unknown-linux-gnu > > locale: > > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC _NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDEN TIFICATION=C > > attached base packages: > [1] grid splines tools stats graphics grDevices utils > [8] datasets methods base > > other attached packages: > [1] gtools_2.5.0-1 multicore_0.1-3 ppiStats_1.8.0 > [4] RColorBrewer_1.0-2 lattice_0.17-17 ScISI_1.14.0 > [7] apComplex_2.8.0 ppiData_0.1.13 Rgraphviz_1.20.4 > [10] org.Sc.sgd.db_2.2.6 GOstats_2.8.0 Category_2.8.4 > [13] genefilter_1.22.0 survival_2.34-1 GO.db_2.2.5 > [16] RSQLite_0.7-1 DBI_0.2-4 RpsiXML_1.0.0 > [19] RBGL_1.20.0 hypergraph_1.14.0 graph_1.20.0 > [22] XML_2.3-0 annotate_1.20.1 xtable_1.5-6 > [25] AnnotationDbi_1.4.3 Biobase_2.2.2 > > loaded via a namespace (and not attached): > [1] cluster_1.11.11 GSEABase_1.4.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Tony, Thanks, I updated my R version and bioconductor and was still able to reproduce the error on a different machine. I sent the .xml file to David to reproduce. Here is my new sessionInfo(): R version 2.10.0 (2009-10-26) i386-apple-darwin9.8.0 locale: [1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RpsiXML_1.6.0 hypergraph_1.17.0 XML_2.5-3 RBGL_1.21.2 [5] graph_1.23.3 annotate_1.24.0 AnnotationDbi_1.8.1 Biobase_2.6.1 loaded via a namespace (and not attached): [1] DBI_0.2-4 RSQLite_0.7-1 tools_2.10.0 xtable_1.5-6 sara On 07/12/09 11:03 AM, "Tony Chiang" <tchiang at="" fhcrc.org=""> wrote: Hi Sara, The current release of R is 2.10. I don't know if this will fix the problem, but the current versions of the packages are built for the latest release of R, so the first thing to try is updating your R which will update the version of RpsiXML. I will look at your example in a bit. Tony On Mon, Dec 7, 2009 at 7:03 AM, Sara JC Gosline <sara.gosline at="" mail.mcgill.ca=""> wrote: Hello again, I have recently installed and used RpsiXML to successfully parse the latest xml files from intact. However, when I try the same functions with the latest version of Biogrid (to obtain assay-specific interactions instead of experiment-specific), I get a graph with a single node "NA" and 1 interaction. SessionInfo is at the end of the email. ***Parsing xml files to graph: I used the 'PCA' file since it is relatively short: g<-psimi25XML2Graph('../biogrid/psiml25/BIOGRID-SYSTEM- PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,type='interaction',verbose=T) 1 Entries found Parsing entry 1 Parsing experiments: ............................................... Parsing interactors: 100% ========================================> Parsing interactions: 100% ========================================> g [1] "psimi25Graph" attr(,"package") [1] "RpsiXML" nodes(g) [1] "NA" edges(g) $`NA` [1] "NA" ***Parsing xml file without graph: To determine if this is something wrong with the parsing, I redo the parsing without formatting to a graph object: g<-parsePsimi25Interaction('../biogrid/psiml25/BIOGRID-SYSTEM- PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,verbose=T) Here is the first bit of output: g ================================== interaction entry ( 2009-11-25 ): ================================== [ organism ]: Arabidopsis thaliana Saccharomyces cerevisiae Schizosaccharomyces pombe [ taxonomy ID ]: 3702 4932 4896 [ interactors ]: there are 1214 interactors in total, here are the first few ones: sourceDb sourceId shortLabel uniprotId organismName taxId <na> "" "1" "BZR1" NA "Arabidopsis thaliana" "3702" <na> "" "2" "GRF6" NA "Arabidopsis thaliana" "3702" <na> "" "3" "FUN14" NA "Saccharomyces cerevisiae" "4932" <na> "" "4" "UIP4" NA "Saccharomyces cerevisiae" "4932" <na> "" "5" "ALO1" NA "Saccharomyces cerevisiae" "4932" <na> "" "6" "SPO7" NA "Saccharomyces cerevisiae" "4932" ... [ interactions ]: there are 2736 interactions in total, here are the first few ones: [[1]] interaction ( NA ): --------------------------------- [ source database ]: [ source experiment ID ]: 1 [ interaction type ]: protein complementation assay [ experiment ]: pubmed 17681130 [ participant ]: NA NA [ bait ]: 1 [ bait UniProt ]: NA [ prey ]: 2 [ prey UniProt ]: NA So the interactors and interactions are being parsed correctly, but not being retrieved properly. When I look at the attributes of each interaction I get mostly NA's: attributes(g at interactions[[1]]) $sourceDb [1] "" $sourceId [1] NA $interactionType [1] "protein complementation assay" $expPubMed [1] "17681130" $expSourceId [1] "1" $confidenceValue [1] NA $participant <na> <na> NA NA $bait [1] "1" $baitUniProt [1] NA $prey [1] "2" $preyUniProt [1] NA $inhibitor [1] NA $neutralComponent [1] NA $class [1] "psimi25Interaction" attr(,"package") [1] "RpsiXML" ***Conclusion: Is there an easy workaround for this? Maybe where I can manually look up identifiers? Thanks, sara ***SessionInfo: sessionInfo() R version 2.8.1 (2008-12-22) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] grid splines tools stats graphics grDevices utils [8] datasets methods base other attached packages: [1] gtools_2.5.0-1 multicore_0.1-3 ppiStats_1.8.0 [4] RColorBrewer_1.0-2 lattice_0.17-17 ScISI_1.14.0 [7] apComplex_2.8.0 ppiData_0.1.13 Rgraphviz_1.20.4 [10] org.Sc.sgd.db_2.2.6 GOstats_2.8.0 Category_2.8.4 [13] genefilter_1.22.0 survival_2.34-1 GO.db_2.2.5 [16] RSQLite_0.7-1 DBI_0.2-4 RpsiXML_1.0.0 [19] RBGL_1.20.0 hypergraph_1.14.0 graph_1.20.0 [22] XML_2.3-0 annotate_1.20.1 xtable_1.5-6 [25] AnnotationDbi_1.4.3 Biobase_2.2.2 loaded via a namespace (and not attached): [1] cluster_1.11.11 GSEABase_1.4.0 _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
@jitao-david-zhang-3188
Last seen 7.2 years ago
Hi Sara, Thanks for reporting the issue. Could you please send me a copy of the file you used? I will try to reproduce the error and find the fix then. Best wishes, David 2009/12/7 Sara JC Gosline <sara.gosline@mail.mcgill.ca> > Hello again, > > I have recently installed and used RpsiXML to successfully parse the latest > xml files from intact. However, when I try the same functions with the > latest version of Biogrid (to obtain assay-specific interactions instead of > experiment-specific), I get a graph with a single node “NA” and 1 > interaction. SessionInfo is at the end of the email. > > ***Parsing xml files to graph: > I used the ‘PCA’ file since it is relatively short: > >> >> g<-psimi25XML2Graph('../biogrid/psiml25/BIOGRID-SYSTEM- PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,type='interaction',verbose=T) > 1 Entries found > Parsing entry 1 > Parsing experiments: ............................................... > Parsing interactors: > 100% ========================================> > Parsing interactions: > 100% ========================================> > >> g >> > [1] "psimi25Graph" > attr(,"package") > [1] "RpsiXML" > >> nodes(g) >> > [1] "NA" > >> edges(g) >> > $`NA` > [1] "NA" > > ***Parsing xml file without graph: > To determine if this is something wrong with the parsing, I redo the > parsing without formatting to a graph object: > >> >> g<-parsePsimi25Interaction('../biogrid/psiml25/BIOGRID-SYSTEM- PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,verbose=T) > > Here is the first bit of output: > >> g >> > ================================== > interaction entry ( 2009-11-25 ): > ================================== > [ organism ]: Arabidopsis thaliana Saccharomyces cerevisiae > Schizosaccharomyces pombe > [ taxonomy ID ]: 3702 4932 4896 > [ interactors ]: there are 1214 interactors in total, here are the first > few ones: > sourceDb sourceId shortLabel uniprotId organismName taxId > <na> "" "1" "BZR1" NA "Arabidopsis thaliana" "3702" > <na> "" "2" "GRF6" NA "Arabidopsis thaliana" "3702" > <na> "" "3" "FUN14" NA "Saccharomyces cerevisiae" "4932" > <na> "" "4" "UIP4" NA "Saccharomyces cerevisiae" "4932" > <na> "" "5" "ALO1" NA "Saccharomyces cerevisiae" "4932" > <na> "" "6" "SPO7" NA "Saccharomyces cerevisiae" "4932" > ... > [ interactions ]: there are 2736 interactions in total, here are the first > few ones: > [[1]] > interaction ( NA ): > --------------------------------- > [ source database ]: > [ source experiment ID ]: 1 > [ interaction type ]: protein complementation assay > [ experiment ]: pubmed 17681130 > [ participant ]: NA NA > [ bait ]: 1 > [ bait UniProt ]: NA > [ prey ]: 2 > [ prey UniProt ]: NA > > So the interactors and interactions are being parsed correctly, but not > being retrieved properly. When I look at the attributes of each interaction > I get mostly NA’s: > attributes(g@interactions[[1]]) > $sourceDb > [1] "" > > $sourceId > [1] NA > > $interactionType > [1] "protein complementation assay" > > $expPubMed > [1] "17681130" > > $expSourceId > [1] "1" > > $confidenceValue > [1] NA > > $participant > <na> <na> > NA NA > > $bait > [1] "1" > > $baitUniProt > [1] NA > > $prey > [1] "2" > > $preyUniProt > [1] NA > > $inhibitor > [1] NA > > $neutralComponent > [1] NA > > $class > [1] "psimi25Interaction" > attr(,"package") > [1] "RpsiXML" > > > > ***Conclusion: > Is there an easy workaround for this? Maybe where I can manually look up > identifiers? > > Thanks, > sara > > > ***SessionInfo: > > sessionInfo() >> > R version 2.8.1 (2008-12-22) > x86_64-unknown-linux-gnu > > locale: > > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC _NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDEN TIFICATION=C > > attached base packages: > [1] grid splines tools stats graphics grDevices utils > [8] datasets methods base > > other attached packages: > [1] gtools_2.5.0-1 multicore_0.1-3 ppiStats_1.8.0 > [4] RColorBrewer_1.0-2 lattice_0.17-17 ScISI_1.14.0 > [7] apComplex_2.8.0 ppiData_0.1.13 Rgraphviz_1.20.4 > [10] org.Sc.sgd.db_2.2.6 GOstats_2.8.0 Category_2.8.4 > [13] genefilter_1.22.0 survival_2.34-1 GO.db_2.2.5 > [16] RSQLite_0.7-1 DBI_0.2-4 RpsiXML_1.0.0 > [19] RBGL_1.20.0 hypergraph_1.14.0 graph_1.20.0 > [22] XML_2.3-0 annotate_1.20.1 xtable_1.5-6 > [25] AnnotationDbi_1.4.3 Biobase_2.2.2 > > loaded via a namespace (and not attached): > [1] cluster_1.11.11 GSEABase_1.4.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Jitao David Zhang Biological Statistics and Computational Biology Ph.D. Division of Molecular Genome Analysis DKFZ, Heidelberg D-69120, Germany http://www.NextBioMotif.com/ [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6