Hello,
I'm attempting to use the GOstats package to do a differential KEGG pathway analysis in wheat (non-model organism without an organism package).
I'm following the guidelines from GOstats for unsupported organisms here. The problem is at step 1.3, using KEGGFrame in a KeggID/GeneID data frame, where the function returns a message saying the KEGG Ids are not valid. The problem also occurs when replacing "taes" with "eco" in KeggLink and using this data in KEGGFrame.
The fact that the KEGG ids were obtained using KEGGREST suggest that they are legitimate Kegg IDs (and searching them on Kegg website returns valid genes). Has anyone seen this before and any ideas how to get over it? I've tried not removing the path id text with stringr, but that didn't help.
I've just freshly installed R4.4 and updated all packages on Rstudio, however I code in Microsoft visual studio code.
Hope someone is able to help. Best regards, GV Yoshikawa
> keggpath <- keggLink("pathway", "taes")
> keggpath_df <- data.frame(path_id = keggpath,
+ kegg_id = names(keggpath),
+ stringsAsFactors = FALSE)
> head(keggpath_df)
path_id kegg_id
1 path:taes00010 taes:100037593
2 path:taes00010 taes:100038341
3 path:taes00010 taes:100125727
4 path:taes00010 taes:100415821
5 path:taes00010 taes:100415882
6 path:taes00010 taes:100682413
> > keggframeData <- keggpath_df %>%
+ dplyr::select(kegg_id, path_id) %>%
+ mutate(path_id = str_remove(path_id, "^.*:"),
+ kegg_id = str_remove(kegg_id, "^.*:"))
> head(keggframeData)
kegg_id path_id
1 100037593 taes00010
2 100038341 taes00010
3 100125727 taes00010
4 100415821 taes00010
5 100415882 taes00010
6 100682413 taes00010
> keggframeData$kegg_id <- as.character(keggframeData$kegg_id)
> keggframeData$path_id <- as.character(keggframeData$path_id)
> keggFrame <- KEGGFrame(keggframeData)
Error in KEGGFrame(keggframeData) :
None of elements in the 1st column of your data.frame object are legitimate KEGG IDs.
>
> sessionInfo()
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=English_Australia.utf8 LC_CTYPE=English_Australia.utf8
[3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Australia.utf8
time zone: Australia/Adelaide
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] stringr_1.5.1 GSEABase_1.68.0 annotate_1.84.0
[4] XML_3.99-0.17 GOstats_2.72.0 graph_1.84.0
[7] Category_2.72.0 Matrix_1.7-1 AnnotationDbi_1.68.0
[10] IRanges_2.40.0 S4Vectors_0.44.0 Biobase_2.66.0
[13] BiocGenerics_0.52.0 dplyr_1.1.4 KEGGREST_1.46.0
loaded via a namespace (and not attached):
[1] utf8_1.2.4 generics_0.1.3 bitops_1.0-9
[4] stringi_1.8.4 RSQLite_2.3.7 lattice_0.22-6
[7] magrittr_2.0.3 grid_4.4.1 genefilter_1.88.0
[10] GO.db_3.20.0 fastmap_1.2.0 blob_1.2.4
[13] jsonlite_1.8.9 GenomeInfoDb_1.42.0 DBI_1.2.3
[16] survival_3.7-0 httr_1.4.7 fansi_1.0.6
[19] UCSC.utils_1.2.0 Rgraphviz_2.50.0 Biostrings_2.74.0
[22] cli_3.6.2 rlang_1.1.3 crayon_1.5.3
[25] XVector_0.46.0 AnnotationForge_1.48.0 splines_4.4.1
[28] bit64_4.5.2 withr_3.0.2 cachem_1.1.0
[31] tools_4.4.1 memoise_2.0.1 GenomeInfoDbData_1.2.13
[34] curl_5.2.3 vctrs_0.6.5 R6_2.5.1
[37] png_0.1-8 matrixStats_1.4.1 lifecycle_1.0.4
[40] zlibbioc_1.52.0 RBGL_1.82.0 bit_4.5.0
[43] pkgconfig_2.0.3 pillar_1.9.0 glue_1.8.0
[46] tibble_3.2.1 tidyselect_1.2.1 MatrixGenerics_1.18.0
[49] xtable_1.8-4 compiler_4.4.1 RCurl_1.98-1.16
>