ROntoTools: Set returned by keggPathwayGraphs is missing 121 pathways reported by keggPathwayNames
0
0
Entering edit mode
abf ▴ 30
@abf-14661
Last seen 2.1 years ago
United States

I'm using ROntoTools functions to fetch KEGG Graphs and pathway names. There are pathways returned by the function "keggPathwayNames" that are missing from the set of pathways returned by keggPathwayGraphs.

kegg_mmu <- ROntoTools::keggPathwayGraphs(
  organism = "mmu", updateCache = TRUE
)


kpn <- ROntoTools::keggPathwayNames(
  organism = "mmu", updateCache = TRUE
)


> length(kegg_mmu)
[1] 222
> 
> length(kpn)
[1] 343
> 
> length(setdiff(names(kpn), names(kegg_mmu)))
[1] 121
> 
> setdiff(names(kpn), names(kegg_mmu))[1:10]
 [1] "path:mmu00010" "path:mmu00020" "path:mmu00030" "path:mmu00040" "path:mmu00051" "path:mmu00052" "path:mmu00053" "path:mmu00061" "path:mmu00062"
[10] "path:mmu00071"

I'm not whether there is any rhyme or reason to the missing pathways -- for example path:mmu00020 is the citric acid cycle Any help in obtaining a complete set of pathways would be greatly appreciated.

See below for session info:

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.5

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pheatmap_1.0.12      ggfortify_0.4.14     openxlsx_4.2.5       reticulate_1.25      ggrepel_0.9.1        ggplot2_3.3.6        dplyr_1.0.9         
 [8] tibble_3.1.7         tidyr_1.2.0          edgeR_3.38.1         limma_3.52.1         org.Mm.eg.db_3.15.0  AnnotationDbi_1.58.0 IRanges_2.30.0      
[15] S4Vectors_0.34.0     Biobase_2.56.0       ROntoTools_2.24.0    Rgraphviz_2.40.0     KEGGgraph_1.56.0     KEGGREST_1.36.2      boot_1.3-28         
[22] graph_1.74.0         BiocGenerics_0.42.0 

loaded via a namespace (and not attached):
 [1] httr_1.4.3             pkgload_1.2.4          bit64_4.0.5            jsonlite_1.8.0         splines_4.2.0          here_1.0.1            
 [7] brio_1.1.3             assertthat_0.2.1       statmod_1.4.36         blob_1.2.3             GenomeInfoDbData_1.2.8 pillar_1.7.0          
[13] RSQLite_2.2.14         lattice_0.20-45        glue_1.6.2             RColorBrewer_1.1-3     XVector_0.36.0         colorspace_2.0-3      
[19] Matrix_1.4-1           XML_3.99-0.10          pkgconfig_2.0.3        zlibbioc_1.42.0        purrr_0.3.4            scales_1.2.0          
[25] generics_0.1.2         ellipsis_0.3.2         cachem_1.0.6           withr_2.5.0            cli_3.3.0              magrittr_2.0.3        
[31] crayon_1.5.1           memoise_2.0.1          fansi_1.0.3            tools_4.2.0            lifecycle_1.0.1        stringr_1.4.0         
[37] munsell_0.5.0          locfit_1.5-9.5         zip_2.2.0              Biostrings_2.64.0      compiler_4.2.0         GenomeInfoDb_1.32.2   
[43] rlang_1.0.2            RCurl_1.98-1.7         rstudioapi_0.13        bitops_1.0-7           testthat_3.1.4         gtable_0.3.0          
[49] curl_4.3.2             DBI_1.1.2              R6_2.5.1               gridExtra_2.3          fastmap_1.1.0          bit_4.0.4             
[55] utf8_1.2.2             rprojroot_2.0.3        desc_1.4.1             stringi_1.7.6          parallel_4.2.0         Rcpp_1.0.8.3          
[61] vctrs_0.4.1            png_0.1-7              tidyselect_1.1.2
KEGGREST ROntoTools KEGG KEGGgraph • 1.2k views
ADD COMMENT
0
Entering edit mode

One thing I have observed is that for some, but not all, of the missing pathways, the KEGG website does not provide a link to the KGML file. However for other missing pathways, the KGML file is available. For example:

> "path:mmu03010" %in% names(kegg_mmu)
[1] FALSE

However: Ribosome / path:mmu00020 -- https://www.kegg.jp/pathway/mmu03010 does provide a KGML Download link

On the other hand Biosynthesis of Amino Acids which is also missing, does not offer a KGML download link

> "path:mmu01230" %in% names(kegg_mmu)
[1] FALSE

Biosynthesis of Amino Acids -- https://www.kegg.jp/pathway/mmu01230

On the third hand, I can access the KGML for Biosynthesis of Amino Acids using KEGGgraph

> retrieveKGML("mmu01230", organism="mmu", "test.xml")
--2022-08-05 20:37:37--  http://rest.kegg.jp/get/mmu01230/kgml
Resolving rest.kegg.jp (rest.kegg.jp)... 133.103.200.27
Connecting to rest.kegg.jp (rest.kegg.jp)|133.103.200.27|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://rest.kegg.jp/get/mmu01230/kgml [following]
--2022-08-05 20:37:38--  https://rest.kegg.jp/get/mmu01230/kgml
Connecting to rest.kegg.jp (rest.kegg.jp)|133.103.200.27|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/xml]
Saving to: ‘test.xml’

     0K .......... .......... .......... .......... .......... 84.3K
    50K .......... .......... .......... .........              201K=0.8s

2022-08-05 20:37:39 (113 KB/s) - ‘test.xml’ saved [91515]
ADD REPLY
0
Entering edit mode

Another update:

Even though I can download the files for the missing pathways, they don't seem to load correctly

> parseKGML2DataFrame(
+     "data/mmu03010.xml"
+   )
data frame with 0 columns and 0 rows

> parseKGML2DataFrame(
+     "data/mmu01230.xml"
+   )
data frame with 0 columns and 0 rows
ADD REPLY
0
Entering edit mode

For the ribosome this makes sense, as the "Pathway" does not appear to have edges:

>  mmu03010 <- parseKGML2Graph(
+     "data/mmu03010.xml"
+   )
> mmu03010
A graphNEL graph with directed edges
Number of Nodes = 179 
Number of Edges = 0 
> nodes(mmu03010)[1:10]
 [1] "mmu:14109"  "mmu:270106" "mmu:19943"  "mmu:20044"  "mmu:66475"  "mmu:20084"  "mmu:20090"  "mmu:68052"  "mmu:27207" 
[10] "mmu:20054"

For Biosynthesis of amino acids, it seems like it would make sense to have edges when "non-genes", i.e metabolites, are included.

>   mmu01230 <- parseKGML2Graph(
+     "data/mmu01230.xml"
+   )
> mmu01230
A graphNEL graph with directed edges
Number of Nodes = 79 
Number of Edges = 0 


>   mmu01230 <- parseKGML2Graph(
+     "data/mmu01230.xml", genesOnly=FALSE
+   )
> mmu01230
A graphNEL graph with directed edges
Number of Nodes = 360 
Number of Edges = 0
ADD REPLY

Login before adding your answer.

Traffic: 426 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6