kegg.gsets: unknown species
0
0
Entering edit mode
mel.p • 0
@melp-21968
Last seen 5.3 years ago
France

Hi all,

I have a script to find KEGG pathways and it works perfectly for human or mouse. I am now trying to use it with a non-model organism, Stylophora pistillata, but which is in the KEGG organism list (see "spis" at https://www.genome.jp/kegg/catalog/org_list.html). In my script, I am using the kegg.gsets function from gage package as below:

kg.spis=kegg.gsets("spis")

However I got this error message:

Note: Unknown species 'spis'! Error in kegg.species.code(species, na.rm = T, code.only = F) : All species are invalid! Calls: kegg.gsets -> kegg.species.code Execution halted

Since the species exists in KEGG, why does gage not find it ?

Thanks in advance for your help.

Melanie

sessionInfo() R version 3.6.1 (2019-07-05) Platform: x8664-suse-linux-gnu (64-bit) Running under: openSUSE 13.2 (Harlequin) (x8664)

Matrix products: default BLAS: /usr/lib64/R/lib/libRblas.so LAPACK: /usr/lib64/R/lib/libRlapack.so

locale: [1] LCCTYPE=enGB.utf8 LCNUMERIC=C
[3] LC
TIME=enGB.utf8 LCCOLLATE=enGB.utf8
[5] LC
MONETARY=enGB.utf8 LCMESSAGES=enGB.utf8
[7] LC
PAPER=enGB.utf8 LCNAME=C
[9] LCADDRESS=C LCTELEPHONE=C
[11] LCMEASUREMENT=enGB.utf8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] forcats0.4.0 stringr1.4.0 dplyr0.7.6
[4] purrr
0.3.2 readr1.3.1 tidyr0.8.3
[7] tibble2.1.1 ggplot23.2.0 tidyverse1.2.1
[10] gageData
2.22.0 gage2.34.0 pathview1.24.0
[13] org.Hs.eg.db3.5.0 AnnotationDbi1.46.1 IRanges2.18.2
[16] S4Vectors
0.22.1 Biobase2.38.0 BiocGenerics0.30.0 [19] KEGGREST_1.24.0

loaded via a namespace (and not attached): [1] httr1.3.1 bit640.9-7 jsonlite1.6 modelr0.1.4
[5] assertthat0.2.1 blob1.1.1 cellranger1.1.0 pillar1.3.1
[9] RSQLite2.1.1 backports1.1.4 lattice0.20-38 glue1.3.1
[13] digest0.6.18 XVector0.24.0 rvest0.3.4 colorspace1.4-1 [17] XML3.98-1.9 pkgconfig2.0.2 broom0.5.2 haven2.1.0
[21] zlibbioc1.24.0 scales1.0.0 generics0.0.2 withr2.1.2
[25] lazyeval0.2.1 cli1.1.0 magrittr1.5 crayon1.3.4
[29] readxl1.3.1 memoise0.2.1 KEGGgraph1.38.0 fansi0.3.0
[33] nlme3.1-140 xml21.1.1 graph1.56.0 tools3.6.1
[37] hms0.4.2 munsell0.5.0 bindrcpp0.2.2 Biostrings2.52.0 [41] compiler3.6.1 rlang0.4.0 grid3.6.1 rstudioapi0.8
[45] gtable0.3.0 DBI1.0.0 curl3.2 R62.4.0
[49] lubridate1.7.4 bit1.1-14 utf81.1.4 bindr0.1.1
[53] Rgraphviz2.22.0 stringi1.2.4 Rcpp1.0.1 png0.1-7
[57] tidyselect_0.2.5

gage • 1.5k views
ADD COMMENT
0
Entering edit mode

I've never used gage and I don't want to mess up your script that you already have working, but another possibility would be to use the package KEGGREST.

> library(KEGGREST)
> head(keggList("spis"))
                                                spis:111318821
                        "uncharacterized protein LOC111318821"
                                                spis:111318822
                        "uncharacterized protein LOC111318822"
                                                spis:111318823
                                              "caspase-3-like"
                                                spis:111318825
                        "uncharacterized protein LOC111318825"
                                                spis:111318826
                        "uncharacterized protein LOC111318826"
                                                spis:111318824
"uncharacterized aarF domain-containing protein kinase 1-like"
> query <- keggGet(c("spis:111318821", "spis:111318823"))
> query[[1]]
$ENTRY
        CDS
"111318821"

$DEFINITION
[1] "(RefSeq) uncharacterized protein LOC111318821"

$ORGANISM
                   spis
"Stylophora pistillata"

$POSITION
[1] "Unknown"

$MOTIF
[1] "Pfam: BTB_2 TLD BTB_3"

$DBLINKS
[1] "NCBI-GeneID: 111318821"       "NCBI-ProteinID: XP_022777424"

$AASEQ
  A AAStringSet instance of length 1
    width seq
[1]   330 MMEDNSTGNQVLEDAGNQIREACEVLEREATRLR...CSYQCPTGQNAYTFQAGVKNFIVTDYEVFELHR

$NTSEQ
  A DNAStringSet instance of length 1
    width seq
[1]   993 ATGATGGAAGATAATTCAACTGGCAATCAAGTTC...ACAGATTACGAAGTGTTTGAACTTCACAGATGA

>

Hope this helps a bit!

ADD REPLY
0
Entering edit mode

Hi ! Thanks a lot for your reply. Indeed, KEGGREST seems to be a better solution in this case. I guess I am going to write a new special script for non-model organism. Thanks for the example and for your help :)

ADD REPLY
0
Entering edit mode

In contrast to what you think, the function kegg.gsets() does NOT download the pathway information directly from the KEGG database, but it rather uses a local copy of the pathway information (or it downloads this file from the website of the gage authors). Relevant code in the function kegg.gsets() are lines 37-44:

if (!exists("khier")) {
    if (check.new) {
        si = try(load(url("https://pathview.uncc.edu/data/khier.rda")))
        if (class(si) == "try-error") 
            data(khier, package = "gage")
    }
    else data(khier, package = "gage")
}
  

Although KEGG has pathway info on your organism of interest, apparently no info on your organism of interest is present in the gage pathway file. So you either download this info yourselves using e.g. the KEGGREST package (as mentioned below), or you may want to get in touch withe the authors of the gage package to ask whether they are willing to update their KEGG pathway file.

ADD REPLY
0
Entering edit mode

Hi, thank you for your reply. I think you're right, my species is not in the gage pathway file. I will try the KEGGREST package as mentionned above. Thank you for your help and your time.

ADD REPLY

Login before adding your answer.

Traffic: 611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6