Hi all,
I'm trying to create a SPIA data file for all 483 xml files for the KEGG Orthology (KO) Database. I'm working with a non-model organism that is not supported by KEGG as it's own organism, so I have to use the KEGG Orthology (KO) Database instead of a 3-letter code for an organism. I am running into an array out of bounds error and I was hoping that I could get help figuring out this issue. Is this database, just not supported? I know it's supported for ClusterProfiler and Pathview, and I was hoping it would also be supported by SPIA.
The demo in the SPIA documentation (see below) runs totally fine in my R session. The demo even runs when I change organism="ko"
and when I change out.path="SPIA_output"
, and only breaks when I change the directory to ko_combined_dir
.
Demo in SPIA documentation:
mydir=system.file("extdata/keggxml/hsa",package="SPIA")
dir(mydir) [1] "hsa03013.xml" "hsa03050.xml" "hsa04914.xml" "hsa05210.xml"
makeSPIAdata(kgml.path=mydir,organism="hsa",out.path="./")
My Code:
# Define xml file directories for ko (generic organism) pathways
ko_combined_dir <- paste0(script_dir, "/data/KEGG_ko_xml_Files/ko_combined")
# Define SPIA data output
SPIA_output <- paste0(script_dir, "/results/SPIA/")
# Run SPIA::makeSPIAdata
makeSPIAdata(kgml.path = ko_combined_dir,
organism = "ko",
out.path = SPIA_output)
The error I get is:
Error in L[[ll]][[re]][, nd] : subscript out of bounds
A look into ko_combined_dir
:
> head(dir(ko_combined_dir), 16)
[1] "ko00010.xml" "ko00020.xml" "ko00030.xml" "ko00040.xml" "ko00051.xml" "ko00052.xml" "ko00053.xml" "ko00061.xml"
[9] "ko00062.xml" "ko00071.xml" "ko00073.xml" "ko00100.xml" "ko00120.xml" "ko00121.xml" "ko00130.xml" "ko00140.xml"
Session Info:
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.5.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] SPIA_2.52.0 KEGGgraph_1.60.0 org.Tmaccoyii.eg.db_0.1
[4] org.Sjaponicus.eg.db_0.1 ReactomePA_1.44.0 ggnewscale_0.4.9
[7] conflicted_1.2.0.9000 devtools_2.4.5 usethis_2.2.2
[10] cowplot_1.1.1 RColorBrewer_1.1-3 lintr_3.1.0
[13] httpgd_1.3.1 DBI_1.1.3 RSQLite_2.3.1
[16] DT_0.30 R.utils_2.12.2 R.oo_1.25.0
[19] R.methodsS3_1.8.2 png_0.1-8 gridExtra_2.3
[22] pheatmap_1.0.12 readxl_1.4.3 lubridate_1.9.3
[25] forcats_1.0.0 stringr_1.5.0 readr_2.1.4
[28] tidyr_1.3.0 tibble_3.2.1 tidyverse_2.0.0
[31] magick_2.8.0 ashr_2.2-63 ggrepel_0.9.4
[34] ggplot2_3.4.4 DEGreport_1.36.0 scico_1.5.0
[37] apeglm_1.22.1 tximport_1.28.0 dplyr_1.1.3
[40] SBGNview_1.14.0 SBGNview.data_1.14.0 edgeR_3.42.4
[43] limma_3.56.2 pathview_1.40.0 AnnotationHub_3.8.0
[46] BiocFileCache_2.8.0 dbplyr_2.3.4 AnnotationForge_1.42.2
[49] biomaRt_2.56.1 vsn_3.68.0 DOSE_3.26.1
[52] clusterProfiler_4.9.3.002 GO.db_3.17.0 AnnotationDbi_1.62.2
[55] enrichplot_1.20.3 purrr_1.0.2 here_1.0.1
[58] DESeq2_1.40.2 SummarizedExperiment_1.30.2 Biobase_2.60.0
[61] MatrixGenerics_1.12.3 matrixStats_1.0.0 GenomicRanges_1.52.1
[64] GenomeInfoDb_1.36.4 IRanges_2.34.1 S4Vectors_0.38.2
[67] BiocGenerics_0.46.0
loaded via a namespace (and not attached):
[1] progress_1.2.2 urlchecker_1.0.1 Biostrings_2.68.1
[4] vctrs_0.6.4 digest_0.6.33 shape_1.4.6
[7] mixsqp_0.3-48 MASS_7.3-60 reshape_0.8.9
[10] reshape2_1.4.4 SQUAREM_2021.1 httpuv_1.6.11
[13] foreach_1.5.2 qvalue_2.32.0 withr_2.5.1
[16] psych_2.3.9 xfun_0.40 ggfun_0.1.3
[19] ellipsis_0.3.2 memoise_2.0.1 cyclocomp_1.1.1
[22] gson_0.1.0 profvis_0.3.8 systemfonts_1.0.5
[25] tidytree_0.4.5 GlobalOptions_0.1.2 logging_0.10-108
[28] prettyunits_1.2.0 KEGGREST_1.40.1 promises_1.2.1
[31] httr_1.4.7 ps_1.7.5 rstudioapi_0.15.0
[34] miniUI_0.1.1.1 generics_0.1.3 reactome.db_1.84.0
[37] processx_3.8.2 curl_5.1.0 zlibbioc_1.46.0
[40] ggraph_2.1.0 polyclip_1.10-6 GenomeInfoDbData_1.2.10
[43] interactiveDisplayBase_1.38.0 xtable_1.8-4 desc_1.4.2
[46] doParallel_1.0.17 evaluate_0.22 S4Arrays_1.0.6
[49] preprocessCore_1.62.1 hms_1.1.3 irlba_2.3.5.1
[52] colorspace_2.1-0 filelock_1.0.2 magrittr_2.0.3
[55] Rgraphviz_2.44.0 later_1.3.1 viridis_0.6.4
[58] ggtree_3.8.2 lattice_0.21-9 XML_3.99-0.14
[61] shadowtext_0.1.2 pillar_1.9.0 nlme_3.1-163
[64] iterators_1.0.14 compiler_4.3.1 stringi_1.7.12
[67] plyr_1.8.9 crayon_1.5.2 abind_1.4-5
[70] truncnorm_1.0-9 ggdendro_0.1.23 gridGraphics_0.5-1
[73] emdbook_1.3.13 locfit_1.5-9.8 graphlayouts_1.0.1
[76] org.Hs.eg.db_3.17.0 bit_4.0.5 fastmatch_1.1-4
[79] codetools_0.2-19 crosstalk_1.2.0 bslib_0.5.1
[82] GetoptLong_1.0.5 mime_0.12 splines_4.3.1
[85] circlize_0.4.15 Rcpp_1.0.11 HDO.db_0.99.1
[88] cellranger_1.1.0 knitr_1.44 blob_1.2.4
[91] utf8_1.2.3 clue_0.3-65 BiocVersion_3.17.1
[94] fs_1.6.3 Rdpack_2.5 pkgbuild_1.4.2
[97] ggplotify_0.1.2 Matrix_1.6-1.1 callr_3.7.3
[100] tzdb_0.4.0 svglite_2.1.2 tweenr_2.0.2
[103] pkgconfig_2.0.3 tools_4.3.1 cachem_1.0.8
[106] rbibutils_2.2.15 viridisLite_0.4.2 rvest_1.0.3
[109] numDeriv_2016.8-1.1 graphite_1.46.0 fastmap_1.1.1
[112] rmarkdown_2.25 scales_1.2.1 grid_4.3.1
[115] sass_0.4.7 broom_1.0.5 patchwork_1.1.3
[118] coda_0.19-4 BiocManager_1.30.22 graph_1.78.0
[121] farver_2.1.1 tidygraph_1.2.3 scatterpie_0.2.1
[124] yaml_2.3.7 cli_3.6.1 webshot_0.5.5
[127] lifecycle_1.0.3 mvtnorm_1.2-3 sessioninfo_1.2.2
[130] backports_1.4.1 BiocParallel_1.34.2 timechange_0.2.0
[133] gtable_0.3.4 rjson_0.2.21 parallel_4.3.1
[136] ape_5.7-1 jsonlite_1.8.7 rex_1.2.1
[139] bitops_1.0-7 kableExtra_1.3.4 bit64_4.0.5
[142] yulab.utils_0.1.0 bdsmatrix_1.3-6 jquerylib_0.1.4
[145] highr_0.10 GOSemSim_2.27.3 lazyeval_0.2.2
[148] shiny_1.7.5.1 ConsensusClusterPlus_1.64.0 htmltools_0.5.6.1
[151] affy_1.78.2 rappdirs_0.3.3 glue_1.6.2
[154] XVector_0.40.0 RCurl_1.98-1.12 rprojroot_2.0.3
[157] treeio_1.24.3 mnormt_2.1.1 igraph_1.5.1
[160] invgamma_1.1 R6_2.5.1 labeling_0.4.3
[163] cluster_2.1.4 bbmle_1.0.25 pkgload_1.3.3
[166] aplot_0.2.2 DelayedArray_0.26.7 tidyselect_1.2.0
[169] ggforce_0.4.1 xml2_1.3.5 munsell_0.5.0
[172] rsvg_2.6.0 affyio_1.70.0 data.table_1.14.8
[175] htmlwidgets_1.6.2 fgsea_1.26.0 ComplexHeatmap_2.16.0
[178] rlang_1.1.1 remotes_2.4.2.1 fansi_1.0.5
Best,
Benji
Thanks for this insight! I think there may be a slight difference though.
In https://www.genome.jp/kegg/pathway/hsa/hsa03013.png it seems like the green boxes are the only ones included in the pathway, whereas ko03013 has all possible elements included. I'll have to look into this a little more. Maybe I'll just have to go with using a closely related organism for this step of my analysis.