Question

How to use GAGE with my own metagenomic KOFams data

0

Entering edit mode

cereyredondo • 0

@a300472c

Last seen 12 months ago

Hong Kong

I want to run GAGE with my own metagenomic KOFam functionally annotated data for a wide range of species from all domains of life. I can run the test data but don't understand most of the tutorials (below) so can't adapt it to my own data. https://bioconductor.org/packages/release/bioc/vignettes/gage/inst/doc/dataPrep.pdf https://rdrr.io/bioc/gage/man/kegg.gsets.html https://bioconductor.org/packages/release/bioc/vignettes/gage/inst/doc/gage.pdf

First, I know I have to use species=ko to look up all KEGG IDs and not just specific species, but I'm not sure how to use that (see code below) or at what point or how to do anything else.

Another problem is that my IDs are Pathway-KO-IDs or Brite-KO-IDs and not KO-IDs: E.g. KO-pathway-ID=ko00010 (Glycolysis / Gluconeogenesis) whereas KO-ID=K00010 (myo-inositol 2-dehydrogenase / D-chiro-inositol 1-dehydrogenase). They are different.

Also some of the more generic, higher-level(1) KOFam annotations in my data don't come with a Pathway/Brite-KO-ID, eg "Enzymes with EC numbers" has no ID number. So idk if I can use my KOFam output for GAGE with KEGG. I think it says in the Bioconductor package manuals that you can change the IDs in your dataset to match the KEGG ones but I can't figure out how.

Can someone please explain veeery in detail and step by step how to use my own data for KEGG-GAGE? I can share my data with you if necessary. Thanks so much!

```kegg.gsets(species = "ko", id.type = "kegg", check.new=FALSE)

```sessionInfo( ) R version 4.4.1 (2024-06-14 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=English_Hong Kong SAR.utf8 [2] LC_CTYPE=English_Hong Kong SAR.utf8
[3] LC_MONETARY=English_Hong Kong SAR.utf8 [4] LC_NUMERIC=C
[5] LC_TIME=English_Hong Kong SAR.utf8

time zone: Asia/Hong_Kong tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods
[7] base

other attached packages: [1] gage_2.54.0 rain_1.38.0 multtest_2.60.0
[4] Biobase_2.64.0 BiocGenerics_0.50.0 gmp_0.7-4
[7] pracma_2.4.4 compositions_2.0-8

loaded via a namespace (and not attached): [1] KEGGREST_1.44.1 SummarizedExperiment_1.34.0 [3] gtable_0.3.5 tensorA_0.36.2.1
[5] ggplot2_3.5.1 lattice_0.22-6
[7] vctrs_0.6.5 tools_4.4.1
[9] generics_0.1.3 curl_5.2.1
[11] stats4_4.4.1 parallel_4.4.1
[13] RSQLite_2.3.7 AnnotationDbi_1.66.0
[15] tibble_3.2.1 fansi_1.0.6
[17] blob_1.2.4 DEoptimR_1.1-3
[19] pkgconfig_2.0.3 Matrix_1.7-0
[21] S4Vectors_0.42.0 graph_1.82.0
[23] lifecycle_1.0.4 GenomeInfoDbData_1.2.12
[25] compiler_4.4.1 Biostrings_2.72.1
[27] munsell_0.5.1 DESeq2_1.44.0
[29] codetools_0.2-20 GenomeInfoDb_1.40.1
[31] GO.db_3.19.1 pillar_1.9.0
[33] crayon_1.5.3 MASS_7.3-61
[35] BiocParallel_1.38.0 cachem_1.1.0
[37] DelayedArray_0.30.1 abind_1.4-5
[39] robustbase_0.99-3 tidyselect_1.2.1
[41] locfit_1.5-9.10 dplyr_1.1.4
[43] splines_4.4.1 fastmap_1.2.0
[45] grid_4.4.1 colorspace_2.1-0
[47] cli_3.6.3 SparseArray_1.4.8
[49] magrittr_2.0.3 S4Arrays_1.4.1
[51] survival_3.7-0 utf8_1.2.4
[53] scales_1.3.0 UCSC.utils_1.0.0
[55] bit64_4.0.5 XVector_0.44.0
[57] httr_1.4.7 matrixStats_1.3.0
[59] bit_4.0.5 png_0.1-8
[61] memoise_2.0.1 GenomicRanges_1.56.1
[63] IRanges_2.38.0 rlang_1.1.4
[65] Rcpp_1.0.12 DBI_1.2.3
[67] glue_1.7.0 bayesm_3.1-6
[69] rstudioapi_0.16.0 jsonlite_1.8.8
[71] R6_2.5.1 MatrixGenerics_1.16.0
[73] zlibbioc_1.50.0
```

kegg.gsets gageData gage KEGG • 476 views

ADD COMMENT • link 12 months ago cereyredondo • 0