pathview says my nonmodel species is unknown: "species invalid"
2
0
Entering edit mode
nute11a • 0
@nute11a-22763
Last seen 24 days ago
United States

Hello!

I work with a nonmodel organism that does have a valid KEGG species code listed here: https://www.genome.jp/kegg/pathway.html

This is the command I tried using:

test <- pathview(gene.data=geneList, pathway.id="00440", species="MyKEGGSpeciesCode")

And this is my error:

Note: Unknown species 'MyKEGGSpeciesCode'! Error in kegg.species.code(species, na.rm = T, code.only = FALSE) : All species are invalid!

What I have tried:

detach("package:pathview", unload = TRUE)
devtools::install_github("javadnoorb/pathview")
library(pathview)


detach("package:KEGGREST", unload = TRUE)
BiocManager::install("KEGGREST")
library(KEGGREST)

I also attempted the command replacing human ("hsa") for "MyKEGGSpeciesCode", which did work...

I know the KEGG identifier for my species is valid, because this download worked:

Pathways <- read.table("http://rest.kegg.jp/list/pathway/MyKEGGSpeciesCode", quote="", sep="\t")

I already used the gage package for the KEGG enrichment, I just want to be able to visualize my results.

Please help D: thank you!!!!!

KEGGREST KEGG gage pathview • 366 views
ADD COMMENT
0
Entering edit mode

What is the non-model organism that you are working with?

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

If you have a character object called 'MySpeciesCode' that is just your species code, then you have to pass that object to the function, not a character vector that has the same characters as the name of the object!

I wouldn't do what you have done regardless. The amount of typing required to do

MySpeciesCode <- "ppyr"
test <- pathview(gene.data=geneList, pathway.id="00440", species=MyKEGGSpeciesCode)  <--------- NOTICE NO QUOTES

is more than

test <- pathview(gene.data=geneList, pathway.id="00440", species="ppyr")

and doesn't improve things in any obvious way.

0
Entering edit mode

My answer was a bit of a word salad, so perhaps this will illustrate better what I meant.

## create a character object
> MyKEGGSpeciesCode <- "ppyr"

## the object value
> MyKEGGSpeciesCode
[1] "ppyr"

## a string that looks like it might be useful
## but is just a string
> "MyKEGGSpeciesCode"
[1] "MyKEGGSpeciesCode"

## If you really need to use the string, you can get()
## the object
> get("MyKEGGSpeciesCode")
[1] "ppyr"
ADD REPLY
0
Entering edit mode

Hi James, thanks so much for your response! I actually did:

test <- pathview(gene.data=geneList, pathway.id="00440", species="ppyr")

but thought it would be easier to put in the variable "MyKEGGSpeciesCode" when explaining the problem...unfortunately, the error occurs when I run the above line.

ADD REPLY
0
Entering edit mode

Ah, I get it. So what happens if you provide a species code (which can be an actual species code or the genus and species or even a common name) is that an RDA file containing mappings between those choices is downloaded and used to convert. If your species isn't in that file (it isn't - although hating on one of my all time favorite insects is a travesty), then you get the error you see. But you can instead say species = "ko" and you will get an orthology map which should be fine.

ADD REPLY
0
Entering edit mode

A while ago a faced the same for another non-model organism. I was able to get it working using a 'hack'.

This hack is nothing more than manually generating your own korg object that contains all relevant information.

As James already said, the korg object is present as an RDA (= data) file within the pathview package, and is installed locally on your computer. This object links KEGG's own taxonomy ids, organism and gene ids, scientific and common names of species with other (external) ids. pathview needs this information before it can online query the KEGG database. The issue is that this RDA file has not recently been updated, and therefore your non-model organism is apparently not (yet) included in this cross-mapping file. See also section 8.5 (page 19) in the vignette (PDF) of pathview for more information on this.

Bottom line: before downloading pathway information from the KEGG server, pathview first needs some basic cross-mapping information that is stored locally on your computer. This mapping information apparently is not up-to-date..

As said, the trick is this to create your own korg object that overrides the one present within the package. To do this, first have a look what its content actually is.

> data(korg, package="pathview")
> head(korg)
     ktax.id  tax.id  kegg.code scientific.name          
[1,] "T01001" "9606"  "hsa"     "Homo sapiens"           
[2,] "T01005" "9598"  "ptr"     "Pan troglodytes"        
[3,] "T02283" "9597"  "pps"     "Pan paniscus"           
[4,] "T02442" "9595"  "ggo"     "Gorilla gorilla gorilla"
[5,] "T01416" "9601"  "pon"     "Pongo abelii"           
[6,] "T03265" "61853" "nle"     "Nomascus leucogenys"    
     common.name                     entrez.gnodes kegg.geneid ncbi.geneid
[1,] "human"                         "1"           "270"       "270"      
[2,] "chimpanzee"                    "1"           "457329"    "457329"   
[3,] "bonobo"                        "1"           "100984343" "100984343"
[4,] "western lowland gorilla"       "1"           "101131425" "101131425"
[5,] "Sumatran orangutan"            "1"           "100445431" "100445431"
[6,] "northern white-cheeked gibbon" "1"           "100586984" "100586984"
     ncbi.proteinid uniprot     
[1,] "NP_000027"    "P23109"    
[2,] "XP_016783033" "H2Q039"    
[3,] "XP_003822069" "A0A2R9CI03"
[4,] "XP_004054355" "G3QRV7"    
[5,] "XP_024103376" "Q5R9Z0"    
[6,] "XP_030665586" "G1S766"    
>  

Next create your own, single line, korg object. For your specific use case:

>  korg <- cbind("ktax.id" = "T07438", "tax.id" = "7054", "kegg.code" = "ppyr",
+               "scientific.name" = "Photinus pyralis", "common.name" = "common eastern firefly",
+               "entrez.gnodes" = "1", "kegg.geneid" = "116164712", "ncbi.geneid" = "116164712",
+               "ncbi.proteinid" = "XP_031334780", "uniprot" = NA)
>
> korg
     ktax.id  tax.id kegg.code scientific.name    common.name             
[1,] "T07438" "7054" "ppyr"    "Photinus pyralis" "common eastern firefly"
     entrez.gnodes kegg.geneid ncbi.geneid ncbi.proteinid uniprot
[1,] "1"           "116164712" "116164712" "XP_031334780" NA     
>

What to use for ktax.id, tax.id, kegg.code, scientific.name and common.name can be found on the KEGG page you linked to in answer to my question.

By checking and comparing they type of identifier that is used between KEGG and NCBI, it turns out that for ppyr KEGG uses entrez id as central identifier. Thus entrez.gnodes = 1. For completeness (although I am not sure whether this information is used at all...) I completed korg by adding for a random ppyr gene its kegg/ncbi gene id (are thus identical), and corresponding ncbi protein id. Since no unipot id is known for that particular gene, I left it NA.

Now you are ready to analyze your data!

<<EDIT1: 2 weeks later (31 Aug 2022); the 'https-fix' has been implemented in release versions of pathview, i.e. version 1.36.1 and higher. Thus: there is no need anymore to install the development version!! <</EDIT>>

<<EDIT2: 2 weeks + day later (1 Sept 2022); Luo Weijun, the author of the package, has updated the korg object. THe 'hack' is thus not required anymore. See his post in this thread ("Please update to the latest version of pathview, either the release (1.36.1) or the devel (1.37.1). pathview now supports 8282 KEGG species or 1449 new species beyond 2020. "ppyr" is one of them.") <</EDIT2>>

[[NOT NEEDED ANYMORE]] However, one complicating thing to deal with: the fact that KEGG switched to https connections (from http) in June 2022. This change has meanwhile been addressed in the pathview code, but AFAIK it did not get in release (yet?). Therefore I install the dev version of pathview first. [[END: NOT NEEDED ANYMORE]]

In a new, fresh R session:

> # devtools::install_github("datapplab/pathview", force=TRUE) #not applicable anymore from version 1.36.1 and higher. See above!
>
> # load library
> library(pathview)
> # generate some sample ppyr-specific input data. I selected 16 ids from the TCA cycle pathway (map ppyr00020).
> # also add random logFC for visualization
> ppyr.ids <- c("116170300", "116167655", "116177965", "116172794", "116171160", "116175872",
+               "116176593", "116164618", "116165686", "116175697", "116172860", "116169310",
+               "116169326", "116168556", "116168785", "116165354")
> data.logFC <- runif(n = length(ppyr.ids), min = -5, max = 5)
> names(data.logFC) <- ppyr.ids
> data.logFC[1:5]
 116170300  116167655  116177965  116172794  116171160 
 2.3525920 -0.7268648  3.6781704  0.4266744 -2.5919095 
>
> # now perform the 'hack'; replace korg. Note: **not** needed when using release > v1.36.1; see above.
> korg <- cbind("ktax.id" = "T07438", "tax.id" = "7054", "kegg.code" = "ppyr",
+               "scientific.name" = "Photinus pyralis", "common.name" = "common eastern firefly",
+               "entrez.gnodes" = "1", "kegg.geneid" = "116164712", "ncbi.geneid" = "116164712",
+               "ncbi.proteinid" = "XP_031334780", "uniprot" = NA)
> 
> # create the map of the TCA cycle!
> pv.out <- pathview(gene.data =data.logFC, pathway.id = "00020",
+                    species = "ppyr", out.suffix = "Photinus.pyralis",
+                    kegg.native = TRUE)
Warning: No annotation package for the species ppyr, gene symbols not mapped!
Info: Working in directory E:/000test
Info: Writing image file ppyr00020.Photinus.pyralis.png
> 
# The warning indicates that the enzyme codes on the KEGG map cannot be replaced
# by gene symbols because no annotation library for ppyr is installed (and is also not available at Bioconductor).
>
#DONE!

The output: enter image description here

> sessionInfo()
R version 4.2.0 Patched (2022-05-12 r82348 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pathview_1.31.3
ADD REPLY
0
Entering edit mode

Hello! Thank you so much for your response, I really appreciate it! I will give this a shot and update how it works. Thanks again!

ADD REPLY
0
Entering edit mode

Thank you so much, I'm sorry for my delayed response. I really appreciate this explanation!

ADD REPLY
0
Entering edit mode
Luo Weijun ★ 1.5k
@luo-weijun-1783
Last seen 4 weeks ago
United States

Please update to the latest version of pathview, either the release (1.36.1) or the devel (1.37.1). Pathview now supports 8282 KEGG species or 1449 new species beyond 2020. "ppyr" is one of them.

korg[korg[,3]=="ppyr",]

             ktax.id                   tax.id                kegg.code
            "T07438"                   "7054"                   "ppyr"
     scientific.name              common.name            entrez.gnodes
  "Photinus pyralis" "common eastern firefly"                      "1"
         kegg.geneid              ncbi.geneid           ncbi.proteinid
         "116173017"              "116173017"           "XP_031346164"
             uniprot
                  NA
ADD COMMENT

Login before adding your answer.

Traffic: 232 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6