I want to run GAGE with my own metagenomic KOFam functionally annotated data for a wide range of species from all domains of life. I can run the test data but don't understand most of the tutorials (below) so can't adapt it to my own data. https://bioconductor.org/packages/release/bioc/vignettes/gage/inst/doc/dataPrep.pdf https://rdrr.io/bioc/gage/man/kegg.gsets.html https://bioconductor.org/packages/release/bioc/vignettes/gage/inst/doc/gage.pdf
First, I know I have to use species=ko to look up all KEGG IDs and not just specific species, but I'm not sure how to use that (see code below) or at what point or how to do anything else.
Another problem is that my IDs are Pathway-KO-IDs or Brite-KO-IDs and not KO-IDs: E.g. KO-pathway-ID=ko00010 (Glycolysis / Gluconeogenesis) whereas KO-ID=K00010 (myo-inositol 2-dehydrogenase / D-chiro-inositol 1-dehydrogenase). They are different.
Also some of the more generic, higher-level(1) KOFam annotations in my data don't come with a Pathway/Brite-KO-ID, eg "Enzymes with EC numbers" has no ID number. So idk if I can use my KOFam output for GAGE with KEGG. I think it says in the Bioconductor package manuals that you can change the IDs in your dataset to match the KEGG ones but I can't figure out how.
Can someone please explain veeery in detail and step by step how to use my own data for KEGG-GAGE? I can share my data with you if necessary. Thanks so much!
```kegg.gsets(species = "ko", id.type = "kegg", check.new=FALSE)
```kegg.gsets(species = "ko", id.type = "kegg", check.new=FALSE)
```sessionInfo( ) R version 4.4.1 (2024-06-14 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=English_Hong Kong SAR.utf8
[2] LC_CTYPE=English_Hong Kong SAR.utf8
[3] LC_MONETARY=English_Hong Kong SAR.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_Hong Kong SAR.utf8
time zone: Asia/Hong_Kong tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] gage_2.54.0 rain_1.38.0 multtest_2.60.0
[4] Biobase_2.64.0 BiocGenerics_0.50.0 gmp_0.7-4
[7] pracma_2.4.4 compositions_2.0-8
loaded via a namespace (and not attached):
[1] KEGGREST_1.44.1 SummarizedExperiment_1.34.0
[3] gtable_0.3.5 tensorA_0.36.2.1
[5] ggplot2_3.5.1 lattice_0.22-6
[7] vctrs_0.6.5 tools_4.4.1
[9] generics_0.1.3 curl_5.2.1
[11] stats4_4.4.1 parallel_4.4.1
[13] RSQLite_2.3.7 AnnotationDbi_1.66.0
[15] tibble_3.2.1 fansi_1.0.6
[17] blob_1.2.4 DEoptimR_1.1-3
[19] pkgconfig_2.0.3 Matrix_1.7-0
[21] S4Vectors_0.42.0 graph_1.82.0
[23] lifecycle_1.0.4 GenomeInfoDbData_1.2.12
[25] compiler_4.4.1 Biostrings_2.72.1
[27] munsell_0.5.1 DESeq2_1.44.0
[29] codetools_0.2-20 GenomeInfoDb_1.40.1
[31] GO.db_3.19.1 pillar_1.9.0
[33] crayon_1.5.3 MASS_7.3-61
[35] BiocParallel_1.38.0 cachem_1.1.0
[37] DelayedArray_0.30.1 abind_1.4-5
[39] robustbase_0.99-3 tidyselect_1.2.1
[41] locfit_1.5-9.10 dplyr_1.1.4
[43] splines_4.4.1 fastmap_1.2.0
[45] grid_4.4.1 colorspace_2.1-0
[47] cli_3.6.3 SparseArray_1.4.8
[49] magrittr_2.0.3 S4Arrays_1.4.1
[51] survival_3.7-0 utf8_1.2.4
[53] scales_1.3.0 UCSC.utils_1.0.0
[55] bit64_4.0.5 XVector_0.44.0
[57] httr_1.4.7 matrixStats_1.3.0
[59] bit_4.0.5 png_0.1-8
[61] memoise_2.0.1 GenomicRanges_1.56.1
[63] IRanges_2.38.0 rlang_1.1.4
[65] Rcpp_1.0.12 DBI_1.2.3
[67] glue_1.7.0 bayesm_3.1-6
[69] rstudioapi_0.16.0 jsonlite_1.8.8
[71] R6_2.5.1 MatrixGenerics_1.16.0
[73] zlibbioc_1.50.0
```