Error creating SPIA data for KEGG Orthology (KO) Database KEGG xml files
1
0
Entering edit mode
@d24570fc
Last seen 8 months ago
United States

Hi all,

I'm trying to create a SPIA data file for all 483 xml files for the KEGG Orthology (KO) Database. I'm working with a non-model organism that is not supported by KEGG as it's own organism, so I have to use the KEGG Orthology (KO) Database instead of a 3-letter code for an organism. I am running into an array out of bounds error and I was hoping that I could get help figuring out this issue. Is this database, just not supported? I know it's supported for ClusterProfiler and Pathview, and I was hoping it would also be supported by SPIA.

The demo in the SPIA documentation (see below) runs totally fine in my R session. The demo even runs when I change organism="ko" and when I change out.path="SPIA_output", and only breaks when I change the directory to ko_combined_dir.

Demo in SPIA documentation:

mydir=system.file("extdata/keggxml/hsa",package="SPIA")
dir(mydir) [1] "hsa03013.xml" "hsa03050.xml" "hsa04914.xml" "hsa05210.xml"
makeSPIAdata(kgml.path=mydir,organism="hsa",out.path="./")

My Code:

# Define xml file directories for ko (generic organism) pathways
ko_combined_dir <- paste0(script_dir, "/data/KEGG_ko_xml_Files/ko_combined")

# Define SPIA data output
SPIA_output <- paste0(script_dir, "/results/SPIA/")

# Run SPIA::makeSPIAdata
makeSPIAdata(kgml.path = ko_combined_dir,
             organism = "ko",
             out.path = SPIA_output)

The error I get is:

Error in L[[ll]][[re]][, nd] : subscript out of bounds

A look into ko_combined_dir:

> head(dir(ko_combined_dir), 16)
  [1] "ko00010.xml" "ko00020.xml" "ko00030.xml" "ko00040.xml" "ko00051.xml" "ko00052.xml" "ko00053.xml" "ko00061.xml"
  [9] "ko00062.xml" "ko00071.xml" "ko00073.xml" "ko00100.xml" "ko00120.xml" "ko00121.xml" "ko00130.xml" "ko00140.xml"

Session Info:

R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.5.2

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SPIA_2.52.0                 KEGGgraph_1.60.0            org.Tmaccoyii.eg.db_0.1    
 [4] org.Sjaponicus.eg.db_0.1    ReactomePA_1.44.0           ggnewscale_0.4.9           
 [7] conflicted_1.2.0.9000       devtools_2.4.5              usethis_2.2.2              
[10] cowplot_1.1.1               RColorBrewer_1.1-3          lintr_3.1.0                
[13] httpgd_1.3.1                DBI_1.1.3                   RSQLite_2.3.1              
[16] DT_0.30                     R.utils_2.12.2              R.oo_1.25.0                
[19] R.methodsS3_1.8.2           png_0.1-8                   gridExtra_2.3              
[22] pheatmap_1.0.12             readxl_1.4.3                lubridate_1.9.3            
[25] forcats_1.0.0               stringr_1.5.0               readr_2.1.4                
[28] tidyr_1.3.0                 tibble_3.2.1                tidyverse_2.0.0            
[31] magick_2.8.0                ashr_2.2-63                 ggrepel_0.9.4              
[34] ggplot2_3.4.4               DEGreport_1.36.0            scico_1.5.0                
[37] apeglm_1.22.1               tximport_1.28.0             dplyr_1.1.3                
[40] SBGNview_1.14.0             SBGNview.data_1.14.0        edgeR_3.42.4               
[43] limma_3.56.2                pathview_1.40.0             AnnotationHub_3.8.0        
[46] BiocFileCache_2.8.0         dbplyr_2.3.4                AnnotationForge_1.42.2     
[49] biomaRt_2.56.1              vsn_3.68.0                  DOSE_3.26.1                
[52] clusterProfiler_4.9.3.002   GO.db_3.17.0                AnnotationDbi_1.62.2       
[55] enrichplot_1.20.3           purrr_1.0.2                 here_1.0.1                 
[58] DESeq2_1.40.2               SummarizedExperiment_1.30.2 Biobase_2.60.0             
[61] MatrixGenerics_1.12.3       matrixStats_1.0.0           GenomicRanges_1.52.1       
[64] GenomeInfoDb_1.36.4         IRanges_2.34.1              S4Vectors_0.38.2           
[67] BiocGenerics_0.46.0        

loaded via a namespace (and not attached):
  [1] progress_1.2.2                urlchecker_1.0.1              Biostrings_2.68.1            
  [4] vctrs_0.6.4                   digest_0.6.33                 shape_1.4.6                  
  [7] mixsqp_0.3-48                 MASS_7.3-60                   reshape_0.8.9                
 [10] reshape2_1.4.4                SQUAREM_2021.1                httpuv_1.6.11                
 [13] foreach_1.5.2                 qvalue_2.32.0                 withr_2.5.1                  
 [16] psych_2.3.9                   xfun_0.40                     ggfun_0.1.3                  
 [19] ellipsis_0.3.2                memoise_2.0.1                 cyclocomp_1.1.1              
 [22] gson_0.1.0                    profvis_0.3.8                 systemfonts_1.0.5            
 [25] tidytree_0.4.5                GlobalOptions_0.1.2           logging_0.10-108             
 [28] prettyunits_1.2.0             KEGGREST_1.40.1               promises_1.2.1               
 [31] httr_1.4.7                    ps_1.7.5                      rstudioapi_0.15.0            
 [34] miniUI_0.1.1.1                generics_0.1.3                reactome.db_1.84.0           
 [37] processx_3.8.2                curl_5.1.0                    zlibbioc_1.46.0              
 [40] ggraph_2.1.0                  polyclip_1.10-6               GenomeInfoDbData_1.2.10      
 [43] interactiveDisplayBase_1.38.0 xtable_1.8-4                  desc_1.4.2                   
 [46] doParallel_1.0.17             evaluate_0.22                 S4Arrays_1.0.6               
 [49] preprocessCore_1.62.1         hms_1.1.3                     irlba_2.3.5.1                
 [52] colorspace_2.1-0              filelock_1.0.2                magrittr_2.0.3               
 [55] Rgraphviz_2.44.0              later_1.3.1                   viridis_0.6.4                
 [58] ggtree_3.8.2                  lattice_0.21-9                XML_3.99-0.14                
 [61] shadowtext_0.1.2              pillar_1.9.0                  nlme_3.1-163                 
 [64] iterators_1.0.14              compiler_4.3.1                stringi_1.7.12               
 [67] plyr_1.8.9                    crayon_1.5.2                  abind_1.4-5                  
 [70] truncnorm_1.0-9               ggdendro_0.1.23               gridGraphics_0.5-1           
 [73] emdbook_1.3.13                locfit_1.5-9.8                graphlayouts_1.0.1           
 [76] org.Hs.eg.db_3.17.0           bit_4.0.5                     fastmatch_1.1-4              
 [79] codetools_0.2-19              crosstalk_1.2.0               bslib_0.5.1                  
 [82] GetoptLong_1.0.5              mime_0.12                     splines_4.3.1                
 [85] circlize_0.4.15               Rcpp_1.0.11                   HDO.db_0.99.1                
 [88] cellranger_1.1.0              knitr_1.44                    blob_1.2.4                   
 [91] utf8_1.2.3                    clue_0.3-65                   BiocVersion_3.17.1           
 [94] fs_1.6.3                      Rdpack_2.5                    pkgbuild_1.4.2               
 [97] ggplotify_0.1.2               Matrix_1.6-1.1                callr_3.7.3                  
[100] tzdb_0.4.0                    svglite_2.1.2                 tweenr_2.0.2                 
[103] pkgconfig_2.0.3               tools_4.3.1                   cachem_1.0.8                 
[106] rbibutils_2.2.15              viridisLite_0.4.2             rvest_1.0.3                  
[109] numDeriv_2016.8-1.1           graphite_1.46.0               fastmap_1.1.1                
[112] rmarkdown_2.25                scales_1.2.1                  grid_4.3.1                   
[115] sass_0.4.7                    broom_1.0.5                   patchwork_1.1.3              
[118] coda_0.19-4                   BiocManager_1.30.22           graph_1.78.0                 
[121] farver_2.1.1                  tidygraph_1.2.3               scatterpie_0.2.1             
[124] yaml_2.3.7                    cli_3.6.1                     webshot_0.5.5                
[127] lifecycle_1.0.3               mvtnorm_1.2-3                 sessioninfo_1.2.2            
[130] backports_1.4.1               BiocParallel_1.34.2           timechange_0.2.0             
[133] gtable_0.3.4                  rjson_0.2.21                  parallel_4.3.1               
[136] ape_5.7-1                     jsonlite_1.8.7                rex_1.2.1                    
[139] bitops_1.0-7                  kableExtra_1.3.4              bit64_4.0.5                  
[142] yulab.utils_0.1.0             bdsmatrix_1.3-6               jquerylib_0.1.4              
[145] highr_0.10                    GOSemSim_2.27.3               lazyeval_0.2.2               
[148] shiny_1.7.5.1                 ConsensusClusterPlus_1.64.0   htmltools_0.5.6.1            
[151] affy_1.78.2                   rappdirs_0.3.3                glue_1.6.2                   
[154] XVector_0.40.0                RCurl_1.98-1.12               rprojroot_2.0.3              
[157] treeio_1.24.3                 mnormt_2.1.1                  igraph_1.5.1                 
[160] invgamma_1.1                  R6_2.5.1                      labeling_0.4.3               
[163] cluster_2.1.4                 bbmle_1.0.25                  pkgload_1.3.3                
[166] aplot_0.2.2                   DelayedArray_0.26.7           tidyselect_1.2.0             
[169] ggforce_0.4.1                 xml2_1.3.5                    munsell_0.5.0                
[172] rsvg_2.6.0                    affyio_1.70.0                 data.table_1.14.8            
[175] htmlwidgets_1.6.2             fgsea_1.26.0                  ComplexHeatmap_2.16.0        
[178] rlang_1.1.1                   remotes_2.4.2.1               fansi_1.0.5

Best,

Benji

SPIA • 569 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 9 hours ago
United States

It's probably due to differences between the ko XML file and e.g., the hsa version.

library(SPIA)
## ko03013
> z <- parseKGML(dir("ko", "xml$", full.names = TRUE)[1])
## hsa03013
> zz <- parseKGML(paste0(system.file("extdata/keggxml/hsa", package="SPIA"), "/hsa03013.xml"))
> z
KEGG Pathway
[ Title ]: Nucleocytoplasmic transport
[ Name ]: path:ko03013
[ Organism ]: ko
[ Number ] :03013
[ Image ] :https://www.kegg.jp/kegg/pathway/ko/ko03013.png
[ Link ] :https://www.kegg.jp/kegg-bin/show_pathway?ko03013
------------------------------------------------------------
Statistics:
    79 node(s)
    0 edge(s)
    0 reaction(s)
------------------------------------------------------------
> zz
KEGG Pathway
[ Title ]: RNA transport
[ Name ]: path:hsa03013
[ Organism ]: hsa
[ Number ] :03013
[ Image ] :http://www.genome.jp/kegg/pathway/hsa/hsa03013.png
[ Link ] :http://www.genome.jp/kegg-bin/show_pathway?hsa03013
------------------------------------------------------------
Statistics:
    120 node(s)
    80 edge(s)
    0 reaction(s)
------------------------------------------------------------

There are no edges for the ko03013.xml file, but 80 for the hsa version.

Visually I don't see any difference between the two, so maybe you can just use the hsa versions?

ADD COMMENT
0
Entering edit mode

Thanks for this insight! I think there may be a slight difference though.

In https://www.genome.jp/kegg/pathway/hsa/hsa03013.png it seems like the green boxes are the only ones included in the pathway, whereas ko03013 has all possible elements included. I'll have to look into this a little more. Maybe I'll just have to go with using a closely related organism for this step of my analysis.

ADD REPLY

Login before adding your answer.

Traffic: 670 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6