Question: RpsiXML issues with latest Biogrid release files
0
gravatar for Sara JC Gosline
9.9 years ago by
Sara JC Gosline60 wrote:
Hello again, I have recently installed and used RpsiXML to successfully parse the latest xml files from intact. However, when I try the same functions with the latest version of Biogrid (to obtain assay-specific interactions instead of experiment-specific), I get a graph with a single node ?NA? and 1 interaction. SessionInfo is at the end of the email. ***Parsing xml files to graph: I used the ?PCA? file since it is relatively short: > g<-psimi25XML2Graph('../biogrid/psiml25/BIOGRID-SYSTEM- PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,type='interaction',verbose=T) 1 Entries found Parsing entry 1 Parsing experiments: ............................................... Parsing interactors: 100% ========================================> Parsing interactions: 100% ========================================> > g [1] "psimi25Graph" attr(,"package") [1] "RpsiXML" > nodes(g) [1] "NA" > edges(g) $`NA` [1] "NA" ***Parsing xml file without graph: To determine if this is something wrong with the parsing, I redo the parsing without formatting to a graph object: > g<-parsePsimi25Interaction('../biogrid/psiml25/BIOGRID-SYSTEM- PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,verbose=T) Here is the first bit of output: > g ================================== interaction entry ( 2009-11-25 ): ================================== [ organism ]: Arabidopsis thaliana Saccharomyces cerevisiae Schizosaccharomyces pombe [ taxonomy ID ]: 3702 4932 4896 [ interactors ]: there are 1214 interactors in total, here are the first few ones: sourceDb sourceId shortLabel uniprotId organismName taxId <na> "" "1" "BZR1" NA "Arabidopsis thaliana" "3702" <na> "" "2" "GRF6" NA "Arabidopsis thaliana" "3702" <na> "" "3" "FUN14" NA "Saccharomyces cerevisiae" "4932" <na> "" "4" "UIP4" NA "Saccharomyces cerevisiae" "4932" <na> "" "5" "ALO1" NA "Saccharomyces cerevisiae" "4932" <na> "" "6" "SPO7" NA "Saccharomyces cerevisiae" "4932" ... [ interactions ]: there are 2736 interactions in total, here are the first few ones: [[1]] interaction ( NA ): --------------------------------- [ source database ]: [ source experiment ID ]: 1 [ interaction type ]: protein complementation assay [ experiment ]: pubmed 17681130 [ participant ]: NA NA [ bait ]: 1 [ bait UniProt ]: NA [ prey ]: 2 [ prey UniProt ]: NA So the interactors and interactions are being parsed correctly, but not being retrieved properly. When I look at the attributes of each interaction I get mostly NA?s: attributes(g at interactions[[1]]) $sourceDb [1] "" $sourceId [1] NA $interactionType [1] "protein complementation assay" $expPubMed [1] "17681130" $expSourceId [1] "1" $confidenceValue [1] NA $participant <na> <na> NA NA $bait [1] "1" $baitUniProt [1] NA $prey [1] "2" $preyUniProt [1] NA $inhibitor [1] NA $neutralComponent [1] NA $class [1] "psimi25Interaction" attr(,"package") [1] "RpsiXML" ***Conclusion: Is there an easy workaround for this? Maybe where I can manually look up identifiers? Thanks, sara ***SessionInfo: > sessionInfo() R version 2.8.1 (2008-12-22) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] grid splines tools stats graphics grDevices utils [8] datasets methods base other attached packages: [1] gtools_2.5.0-1 multicore_0.1-3 ppiStats_1.8.0 [4] RColorBrewer_1.0-2 lattice_0.17-17 ScISI_1.14.0 [7] apComplex_2.8.0 ppiData_0.1.13 Rgraphviz_1.20.4 [10] org.Sc.sgd.db_2.2.6 GOstats_2.8.0 Category_2.8.4 [13] genefilter_1.22.0 survival_2.34-1 GO.db_2.2.5 [16] RSQLite_0.7-1 DBI_0.2-4 RpsiXML_1.0.0 [19] RBGL_1.20.0 hypergraph_1.14.0 graph_1.20.0 [22] XML_2.3-0 annotate_1.20.1 xtable_1.5-6 [25] AnnotationDbi_1.4.3 Biobase_2.2.2 loaded via a namespace (and not attached): [1] cluster_1.11.11 GSEABase_1.4.0
ADD COMMENTlink modified 9.9 years ago by Jitao David Zhang340 • written 9.9 years ago by Sara JC Gosline60
Answer: RpsiXML issues with latest Biogrid release files
0
gravatar for Tony Chiang
9.9 years ago by
Tony Chiang570
Tony Chiang570 wrote:
Hi Sara, The current release of R is 2.10. I don't know if this will fix the problem, but the current versions of the packages are built for the latest release of R, so the first thing to try is updating your R which will update the version of RpsiXML. I will look at your example in a bit. Tony On Mon, Dec 7, 2009 at 7:03 AM, Sara JC Gosline <sara.gosline@mail.mcgill.ca> wrote: > Hello again, > > I have recently installed and used RpsiXML to successfully parse the latest > xml files from intact. However, when I try the same functions with the > latest version of Biogrid (to obtain assay-specific interactions instead of > experiment-specific), I get a graph with a single node “NA” and 1 > interaction. SessionInfo is at the end of the email. > > ***Parsing xml files to graph: > I used the ‘PCA’ file since it is relatively short: > >> >> g<-psimi25XML2Graph('../biogrid/psiml25/BIOGRID-SYSTEM- PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,type='interaction',verbose=T) > 1 Entries found > Parsing entry 1 > Parsing experiments: ............................................... > Parsing interactors: > 100% ========================================> > Parsing interactions: > 100% ========================================> > >> g >> > [1] "psimi25Graph" > attr(,"package") > [1] "RpsiXML" > >> nodes(g) >> > [1] "NA" > >> edges(g) >> > $`NA` > [1] "NA" > > ***Parsing xml file without graph: > To determine if this is something wrong with the parsing, I redo the > parsing without formatting to a graph object: > >> >> g<-parsePsimi25Interaction('../biogrid/psiml25/BIOGRID-SYSTEM- PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,verbose=T) > > Here is the first bit of output: > >> g >> > ================================== > interaction entry ( 2009-11-25 ): > ================================== > [ organism ]: Arabidopsis thaliana Saccharomyces cerevisiae > Schizosaccharomyces pombe > [ taxonomy ID ]: 3702 4932 4896 > [ interactors ]: there are 1214 interactors in total, here are the first > few ones: > sourceDb sourceId shortLabel uniprotId organismName taxId > <na> "" "1" "BZR1" NA "Arabidopsis thaliana" "3702" > <na> "" "2" "GRF6" NA "Arabidopsis thaliana" "3702" > <na> "" "3" "FUN14" NA "Saccharomyces cerevisiae" "4932" > <na> "" "4" "UIP4" NA "Saccharomyces cerevisiae" "4932" > <na> "" "5" "ALO1" NA "Saccharomyces cerevisiae" "4932" > <na> "" "6" "SPO7" NA "Saccharomyces cerevisiae" "4932" > ... > [ interactions ]: there are 2736 interactions in total, here are the first > few ones: > [[1]] > interaction ( NA ): > --------------------------------- > [ source database ]: > [ source experiment ID ]: 1 > [ interaction type ]: protein complementation assay > [ experiment ]: pubmed 17681130 > [ participant ]: NA NA > [ bait ]: 1 > [ bait UniProt ]: NA > [ prey ]: 2 > [ prey UniProt ]: NA > > So the interactors and interactions are being parsed correctly, but not > being retrieved properly. When I look at the attributes of each interaction > I get mostly NA’s: > attributes(g@interactions[[1]]) > $sourceDb > [1] "" > > $sourceId > [1] NA > > $interactionType > [1] "protein complementation assay" > > $expPubMed > [1] "17681130" > > $expSourceId > [1] "1" > > $confidenceValue > [1] NA > > $participant > <na> <na> > NA NA > > $bait > [1] "1" > > $baitUniProt > [1] NA > > $prey > [1] "2" > > $preyUniProt > [1] NA > > $inhibitor > [1] NA > > $neutralComponent > [1] NA > > $class > [1] "psimi25Interaction" > attr(,"package") > [1] "RpsiXML" > > > > ***Conclusion: > Is there an easy workaround for this? Maybe where I can manually look up > identifiers? > > Thanks, > sara > > > ***SessionInfo: > > sessionInfo() >> > R version 2.8.1 (2008-12-22) > x86_64-unknown-linux-gnu > > locale: > > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC _NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDEN TIFICATION=C > > attached base packages: > [1] grid splines tools stats graphics grDevices utils > [8] datasets methods base > > other attached packages: > [1] gtools_2.5.0-1 multicore_0.1-3 ppiStats_1.8.0 > [4] RColorBrewer_1.0-2 lattice_0.17-17 ScISI_1.14.0 > [7] apComplex_2.8.0 ppiData_0.1.13 Rgraphviz_1.20.4 > [10] org.Sc.sgd.db_2.2.6 GOstats_2.8.0 Category_2.8.4 > [13] genefilter_1.22.0 survival_2.34-1 GO.db_2.2.5 > [16] RSQLite_0.7-1 DBI_0.2-4 RpsiXML_1.0.0 > [19] RBGL_1.20.0 hypergraph_1.14.0 graph_1.20.0 > [22] XML_2.3-0 annotate_1.20.1 xtable_1.5-6 > [25] AnnotationDbi_1.4.3 Biobase_2.2.2 > > loaded via a namespace (and not attached): > [1] cluster_1.11.11 GSEABase_1.4.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 9.9 years ago by Tony Chiang570
Hi Tony, Thanks, I updated my R version and bioconductor and was still able to reproduce the error on a different machine. I sent the .xml file to David to reproduce. Here is my new sessionInfo(): R version 2.10.0 (2009-10-26) i386-apple-darwin9.8.0 locale: [1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] RpsiXML_1.6.0 hypergraph_1.17.0 XML_2.5-3 RBGL_1.21.2 [5] graph_1.23.3 annotate_1.24.0 AnnotationDbi_1.8.1 Biobase_2.6.1 loaded via a namespace (and not attached): [1] DBI_0.2-4 RSQLite_0.7-1 tools_2.10.0 xtable_1.5-6 sara On 07/12/09 11:03 AM, "Tony Chiang" <tchiang at="" fhcrc.org=""> wrote: Hi Sara, The current release of R is 2.10. I don't know if this will fix the problem, but the current versions of the packages are built for the latest release of R, so the first thing to try is updating your R which will update the version of RpsiXML. I will look at your example in a bit. Tony On Mon, Dec 7, 2009 at 7:03 AM, Sara JC Gosline <sara.gosline at="" mail.mcgill.ca=""> wrote: Hello again, I have recently installed and used RpsiXML to successfully parse the latest xml files from intact. However, when I try the same functions with the latest version of Biogrid (to obtain assay-specific interactions instead of experiment-specific), I get a graph with a single node "NA" and 1 interaction. SessionInfo is at the end of the email. ***Parsing xml files to graph: I used the 'PCA' file since it is relatively short: g<-psimi25XML2Graph('../biogrid/psiml25/BIOGRID-SYSTEM- PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,type='interaction',verbose=T) 1 Entries found Parsing entry 1 Parsing experiments: ............................................... Parsing interactors: 100% ========================================> Parsing interactions: 100% ========================================> g [1] "psimi25Graph" attr(,"package") [1] "RpsiXML" nodes(g) [1] "NA" edges(g) $`NA` [1] "NA" ***Parsing xml file without graph: To determine if this is something wrong with the parsing, I redo the parsing without formatting to a graph object: g<-parsePsimi25Interaction('../biogrid/psiml25/BIOGRID-SYSTEM- PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,verbose=T) Here is the first bit of output: g ================================== interaction entry ( 2009-11-25 ): ================================== [ organism ]: Arabidopsis thaliana Saccharomyces cerevisiae Schizosaccharomyces pombe [ taxonomy ID ]: 3702 4932 4896 [ interactors ]: there are 1214 interactors in total, here are the first few ones: sourceDb sourceId shortLabel uniprotId organismName taxId <na> "" "1" "BZR1" NA "Arabidopsis thaliana" "3702" <na> "" "2" "GRF6" NA "Arabidopsis thaliana" "3702" <na> "" "3" "FUN14" NA "Saccharomyces cerevisiae" "4932" <na> "" "4" "UIP4" NA "Saccharomyces cerevisiae" "4932" <na> "" "5" "ALO1" NA "Saccharomyces cerevisiae" "4932" <na> "" "6" "SPO7" NA "Saccharomyces cerevisiae" "4932" ... [ interactions ]: there are 2736 interactions in total, here are the first few ones: [[1]] interaction ( NA ): --------------------------------- [ source database ]: [ source experiment ID ]: 1 [ interaction type ]: protein complementation assay [ experiment ]: pubmed 17681130 [ participant ]: NA NA [ bait ]: 1 [ bait UniProt ]: NA [ prey ]: 2 [ prey UniProt ]: NA So the interactors and interactions are being parsed correctly, but not being retrieved properly. When I look at the attributes of each interaction I get mostly NA's: attributes(g at interactions[[1]]) $sourceDb [1] "" $sourceId [1] NA $interactionType [1] "protein complementation assay" $expPubMed [1] "17681130" $expSourceId [1] "1" $confidenceValue [1] NA $participant <na> <na> NA NA $bait [1] "1" $baitUniProt [1] NA $prey [1] "2" $preyUniProt [1] NA $inhibitor [1] NA $neutralComponent [1] NA $class [1] "psimi25Interaction" attr(,"package") [1] "RpsiXML" ***Conclusion: Is there an easy workaround for this? Maybe where I can manually look up identifiers? Thanks, sara ***SessionInfo: sessionInfo() R version 2.8.1 (2008-12-22) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] grid splines tools stats graphics grDevices utils [8] datasets methods base other attached packages: [1] gtools_2.5.0-1 multicore_0.1-3 ppiStats_1.8.0 [4] RColorBrewer_1.0-2 lattice_0.17-17 ScISI_1.14.0 [7] apComplex_2.8.0 ppiData_0.1.13 Rgraphviz_1.20.4 [10] org.Sc.sgd.db_2.2.6 GOstats_2.8.0 Category_2.8.4 [13] genefilter_1.22.0 survival_2.34-1 GO.db_2.2.5 [16] RSQLite_0.7-1 DBI_0.2-4 RpsiXML_1.0.0 [19] RBGL_1.20.0 hypergraph_1.14.0 graph_1.20.0 [22] XML_2.3-0 annotate_1.20.1 xtable_1.5-6 [25] AnnotationDbi_1.4.3 Biobase_2.2.2 loaded via a namespace (and not attached): [1] cluster_1.11.11 GSEABase_1.4.0 _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLYlink written 9.9 years ago by Sara JC Gosline60
Answer: RpsiXML issues with latest Biogrid release files
0
gravatar for Jitao David Zhang
9.9 years ago by
Jitao David Zhang340 wrote:
Hi Sara, Thanks for reporting the issue. Could you please send me a copy of the file you used? I will try to reproduce the error and find the fix then. Best wishes, David 2009/12/7 Sara JC Gosline <sara.gosline@mail.mcgill.ca> > Hello again, > > I have recently installed and used RpsiXML to successfully parse the latest > xml files from intact. However, when I try the same functions with the > latest version of Biogrid (to obtain assay-specific interactions instead of > experiment-specific), I get a graph with a single node “NA” and 1 > interaction. SessionInfo is at the end of the email. > > ***Parsing xml files to graph: > I used the ‘PCA’ file since it is relatively short: > >> >> g<-psimi25XML2Graph('../biogrid/psiml25/BIOGRID-SYSTEM- PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,type='interaction',verbose=T) > 1 Entries found > Parsing entry 1 > Parsing experiments: ............................................... > Parsing interactors: > 100% ========================================> > Parsing interactions: > 100% ========================================> > >> g >> > [1] "psimi25Graph" > attr(,"package") > [1] "RpsiXML" > >> nodes(g) >> > [1] "NA" > >> edges(g) >> > $`NA` > [1] "NA" > > ***Parsing xml file without graph: > To determine if this is something wrong with the parsing, I redo the > parsing without formatting to a graph object: > >> >> g<-parsePsimi25Interaction('../biogrid/psiml25/BIOGRID-SYSTEM- PCA-2.0.59.psi25.xml',BIOGRID.PSIMI25,verbose=T) > > Here is the first bit of output: > >> g >> > ================================== > interaction entry ( 2009-11-25 ): > ================================== > [ organism ]: Arabidopsis thaliana Saccharomyces cerevisiae > Schizosaccharomyces pombe > [ taxonomy ID ]: 3702 4932 4896 > [ interactors ]: there are 1214 interactors in total, here are the first > few ones: > sourceDb sourceId shortLabel uniprotId organismName taxId > <na> "" "1" "BZR1" NA "Arabidopsis thaliana" "3702" > <na> "" "2" "GRF6" NA "Arabidopsis thaliana" "3702" > <na> "" "3" "FUN14" NA "Saccharomyces cerevisiae" "4932" > <na> "" "4" "UIP4" NA "Saccharomyces cerevisiae" "4932" > <na> "" "5" "ALO1" NA "Saccharomyces cerevisiae" "4932" > <na> "" "6" "SPO7" NA "Saccharomyces cerevisiae" "4932" > ... > [ interactions ]: there are 2736 interactions in total, here are the first > few ones: > [[1]] > interaction ( NA ): > --------------------------------- > [ source database ]: > [ source experiment ID ]: 1 > [ interaction type ]: protein complementation assay > [ experiment ]: pubmed 17681130 > [ participant ]: NA NA > [ bait ]: 1 > [ bait UniProt ]: NA > [ prey ]: 2 > [ prey UniProt ]: NA > > So the interactors and interactions are being parsed correctly, but not > being retrieved properly. When I look at the attributes of each interaction > I get mostly NA’s: > attributes(g@interactions[[1]]) > $sourceDb > [1] "" > > $sourceId > [1] NA > > $interactionType > [1] "protein complementation assay" > > $expPubMed > [1] "17681130" > > $expSourceId > [1] "1" > > $confidenceValue > [1] NA > > $participant > <na> <na> > NA NA > > $bait > [1] "1" > > $baitUniProt > [1] NA > > $prey > [1] "2" > > $preyUniProt > [1] NA > > $inhibitor > [1] NA > > $neutralComponent > [1] NA > > $class > [1] "psimi25Interaction" > attr(,"package") > [1] "RpsiXML" > > > > ***Conclusion: > Is there an easy workaround for this? Maybe where I can manually look up > identifiers? > > Thanks, > sara > > > ***SessionInfo: > > sessionInfo() >> > R version 2.8.1 (2008-12-22) > x86_64-unknown-linux-gnu > > locale: > > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_ US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC _NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDEN TIFICATION=C > > attached base packages: > [1] grid splines tools stats graphics grDevices utils > [8] datasets methods base > > other attached packages: > [1] gtools_2.5.0-1 multicore_0.1-3 ppiStats_1.8.0 > [4] RColorBrewer_1.0-2 lattice_0.17-17 ScISI_1.14.0 > [7] apComplex_2.8.0 ppiData_0.1.13 Rgraphviz_1.20.4 > [10] org.Sc.sgd.db_2.2.6 GOstats_2.8.0 Category_2.8.4 > [13] genefilter_1.22.0 survival_2.34-1 GO.db_2.2.5 > [16] RSQLite_0.7-1 DBI_0.2-4 RpsiXML_1.0.0 > [19] RBGL_1.20.0 hypergraph_1.14.0 graph_1.20.0 > [22] XML_2.3-0 annotate_1.20.1 xtable_1.5-6 > [25] AnnotationDbi_1.4.3 Biobase_2.2.2 > > loaded via a namespace (and not attached): > [1] cluster_1.11.11 GSEABase_1.4.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Jitao David Zhang Biological Statistics and Computational Biology Ph.D. Division of Molecular Genome Analysis DKFZ, Heidelberg D-69120, Germany http://www.NextBioMotif.com/ [[alternative HTML version deleted]]
ADD COMMENTlink written 9.9 years ago by Jitao David Zhang340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 220 users visited in the last hour