pathview says my nonmodel species is unknown: "species invalid"
2
1
Entering edit mode
nute11a ▴ 10
@nute11a-22763
Last seen 21 months ago
United States

Hello!

I work with a nonmodel organism that does have a valid KEGG species code listed here: https://www.genome.jp/kegg/pathway.html

This is the command I tried using:

test <- pathview(gene.data=geneList, pathway.id="00440", species="MyKEGGSpeciesCode")

And this is my error:

Note: Unknown species 'MyKEGGSpeciesCode'! Error in kegg.species.code(species, na.rm = T, code.only = FALSE) : All species are invalid!

What I have tried:

detach("package:pathview", unload = TRUE)
devtools::install_github("javadnoorb/pathview")
library(pathview)


detach("package:KEGGREST", unload = TRUE)
BiocManager::install("KEGGREST")
library(KEGGREST)

I also attempted the command replacing human ("hsa") for "MyKEGGSpeciesCode", which did work...

I know the KEGG identifier for my species is valid, because this download worked:

Pathways <- read.table("http://rest.kegg.jp/list/pathway/MyKEGGSpeciesCode", quote="", sep="\t")

I already used the gage package for the KEGG enrichment, I just want to be able to visualize my results.

Please help D: thank you!!!!!

KEGGREST KEGG gage pathview • 3.1k views
ADD COMMENT
0
Entering edit mode

What is the non-model organism that you are working with?

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 17 months ago
United States

Please update to the latest version of pathview, either the release (1.36.1) or the devel (1.37.1). Pathview now supports 8282 KEGG species or 1449 new species beyond 2020. "ppyr" is one of them.

korg[korg[,3]=="ppyr",]

             ktax.id                   tax.id                kegg.code
            "T07438"                   "7054"                   "ppyr"
     scientific.name              common.name            entrez.gnodes
  "Photinus pyralis" "common eastern firefly"                      "1"
         kegg.geneid              ncbi.geneid           ncbi.proteinid
         "116173017"              "116173017"           "XP_031346164"
             uniprot
                  NA
ADD COMMENT
0
Entering edit mode

Hi Luo, thank you so much for your helpful comment. While I was able to get this to work and visualize my interesting KEGG pathways (yay!) I was revisiting old code and got this error again! I checked the version of pathview I'm using and it is pathview_1.36.1

The error: Warning: No annotation package for the species ppyr, gene symbols not mapped! Warning: restarting interrupted promise evaluationWarning: internal error -3 in R_decompress1Error in keggview.native(plot.data.gene = plot.data.gene, cols.ts.gene = cols.ts.gene, : lazy-load database '/Library/Frameworks/R.framework/Versions/4.2/Resources/library/png/R/png.rdb' is corrupt

Do you have any recommendations for what I should try? Thank you!!

Forgot to add: I ran this command: korg[korg[,3]=="ppyr",] and it worked fine showing me the same information you posted about. but when I got there error when I tried to run:

pathv.out <- pathview(gene.data =fc.matrix, pathway.id = "00020", species = "ppyr", out.suffix = "00020_out", kegg.native = TRUE, gene.idtype = "KEGG")

ADD REPLY
0
Entering edit mode

Does a restart of R help?

Does a reinstall of the library png help?

In a fresh R-session, what is the output from BiocManager::valid() ?

Reason I am asking is that in my hands the code still is working fine...

> library(pathview)
> ppyr.ids <- c("116170300", "116167655", "116177965", "116172794", "116171160", "116175872",
+               "116176593", "116164618", "116165686", "116175697", "116172860", "116169310",
+               "116169326", "116168556", "116168785", "116165354")
> data.logFC <- runif(n = length(ppyr.ids), min = -5, max = 5)
> names(data.logFC) <- ppyr.ids
> data.logFC[1:5]
 116170300  116167655  116177965  116172794  116171160 
-3.2381859 -3.6275867 -3.4668049 -4.0651022 -0.7865763 
> pv.out <- pathview(gene.data =data.logFC, pathway.id = "00020",
+                    species = "ppyr", out.suffix = "Photinus.pyralis",
+                    kegg.native = TRUE)
Warning: No annotation package for the species ppyr, gene symbols not mapped!
Info: Working in directory E:/000test
Info: Writing image file ppyr00020.Photinus.pyralis.png
> 
>
> packageVersion("pathview")
[1] ‘1.38.0’
> sessionInfo()
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pathview_1.38.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10            graph_1.76.0           KEGGgraph_1.58.3      
 [4] AnnotationDbi_1.60.0   XVector_0.38.0         BiocGenerics_0.44.0   
 [7] zlibbioc_1.44.0        IRanges_2.32.0         bit_4.0.5             
[10] R6_2.5.1               rlang_1.0.6            fastmap_1.1.0         
[13] org.Hs.eg.db_3.16.0    blob_1.2.3             httr_1.4.4            
[16] GenomeInfoDb_1.34.6    tools_4.2.2            grid_4.2.2            
[19] Biobase_2.58.0         png_0.1-8              cli_3.6.0             
[22] DBI_1.1.3              bit64_4.0.5            crayon_1.5.2          
[25] GenomeInfoDbData_1.2.9 Rgraphviz_2.42.0       vctrs_0.5.2           
[28] S4Vectors_0.36.1       bitops_1.0-7           KEGGREST_1.38.0       
[31] RCurl_1.98-1.10        cachem_1.0.6           memoise_2.0.1         
[34] RSQLite_2.2.20         compiler_4.2.2         Biostrings_2.66.0     
[37] stats4_4.2.2           XML_3.99-0.13          pkgconfig_2.0.3       
> 
ADD REPLY
0
Entering edit mode

Restarting R worked :) thanks!!

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 23 hours ago
United States

If you have a character object called 'MySpeciesCode' that is just your species code, then you have to pass that object to the function, not a character vector that has the same characters as the name of the object!

I wouldn't do what you have done regardless. The amount of typing required to do

MySpeciesCode <- "ppyr"
test <- pathview(gene.data=geneList, pathway.id="00440", species=MyKEGGSpeciesCode)  <--------- NOTICE NO QUOTES

is more than

test <- pathview(gene.data=geneList, pathway.id="00440", species="ppyr")

and doesn't improve things in any obvious way.

ADD COMMENT
0
Entering edit mode

My answer was a bit of a word salad, so perhaps this will illustrate better what I meant.

## create a character object
> MyKEGGSpeciesCode <- "ppyr"

## the object value
> MyKEGGSpeciesCode
[1] "ppyr"

## a string that looks like it might be useful
## but is just a string
> "MyKEGGSpeciesCode"
[1] "MyKEGGSpeciesCode"

## If you really need to use the string, you can get()
## the object
> get("MyKEGGSpeciesCode")
[1] "ppyr"
ADD REPLY
0
Entering edit mode

Hi James, thanks so much for your response! I actually did:

test <- pathview(gene.data=geneList, pathway.id="00440", species="ppyr")

but thought it would be easier to put in the variable "MyKEGGSpeciesCode" when explaining the problem...unfortunately, the error occurs when I run the above line.

ADD REPLY
0
Entering edit mode

Ah, I get it. So what happens if you provide a species code (which can be an actual species code or the genus and species or even a common name) is that an RDA file containing mappings between those choices is downloaded and used to convert. If your species isn't in that file (it isn't - although hating on one of my all time favorite insects is a travesty), then you get the error you see. But you can instead say species = "ko" and you will get an orthology map which should be fine.

ADD REPLY
1
Entering edit mode

A while ago a faced the same for another non-model organism. I was able to get it working using a 'hack'.

This hack is nothing more than manually generating your own korg object that contains all relevant information.

As James already said, the korg object is present as an RDA (= data) file within the pathview package, and is installed locally on your computer. This object links KEGG's own taxonomy ids, organism and gene ids, scientific and common names of species with other (external) ids. pathview needs this information before it can online query the KEGG database. The issue is that this RDA file has not recently been updated, and therefore your non-model organism is apparently not (yet) included in this cross-mapping file. See also section 8.5 (page 19) in the vignette (PDF) of pathview for more information on this.

Bottom line: before downloading pathway information from the KEGG server, pathview first needs some basic cross-mapping information that is stored locally on your computer. This mapping information apparently is not up-to-date..

As said, the trick is this to create your own korg object that overrides the one present within the package. To do this, first have a look what its content actually is.

> data(korg, package="pathview")
> head(korg)
     ktax.id  tax.id  kegg.code scientific.name          
[1,] "T01001" "9606"  "hsa"     "Homo sapiens"           
[2,] "T01005" "9598"  "ptr"     "Pan troglodytes"        
[3,] "T02283" "9597"  "pps"     "Pan paniscus"           
[4,] "T02442" "9595"  "ggo"     "Gorilla gorilla gorilla"
[5,] "T01416" "9601"  "pon"     "Pongo abelii"           
[6,] "T03265" "61853" "nle"     "Nomascus leucogenys"    
     common.name                     entrez.gnodes kegg.geneid ncbi.geneid
[1,] "human"                         "1"           "270"       "270"      
[2,] "chimpanzee"                    "1"           "457329"    "457329"   
[3,] "bonobo"                        "1"           "100984343" "100984343"
[4,] "western lowland gorilla"       "1"           "101131425" "101131425"
[5,] "Sumatran orangutan"            "1"           "100445431" "100445431"
[6,] "northern white-cheeked gibbon" "1"           "100586984" "100586984"
     ncbi.proteinid uniprot     
[1,] "NP_000027"    "P23109"    
[2,] "XP_016783033" "H2Q039"    
[3,] "XP_003822069" "A0A2R9CI03"
[4,] "XP_004054355" "G3QRV7"    
[5,] "XP_024103376" "Q5R9Z0"    
[6,] "XP_030665586" "G1S766"    
>  

Next create your own, single line, korg object. For your specific use case:

>  korg <- cbind("ktax.id" = "T07438", "tax.id" = "7054", "kegg.code" = "ppyr",
+               "scientific.name" = "Photinus pyralis", "common.name" = "common eastern firefly",
+               "entrez.gnodes" = "1", "kegg.geneid" = "116164712", "ncbi.geneid" = "116164712",
+               "ncbi.proteinid" = "XP_031334780", "uniprot" = NA)
>
> korg
     ktax.id  tax.id kegg.code scientific.name    common.name             
[1,] "T07438" "7054" "ppyr"    "Photinus pyralis" "common eastern firefly"
     entrez.gnodes kegg.geneid ncbi.geneid ncbi.proteinid uniprot
[1,] "1"           "116164712" "116164712" "XP_031334780" NA     
>

What to use for ktax.id, tax.id, kegg.code, scientific.name and common.name can be found on the KEGG page you linked to in answer to my question.

By checking and comparing they type of identifier that is used between KEGG and NCBI, it turns out that for ppyr KEGG uses entrez id as central identifier. Thus entrez.gnodes = 1. For completeness (although I am not sure whether this information is used at all...) I completed korg by adding for a random ppyr gene its kegg/ncbi gene id (are thus identical), and corresponding ncbi protein id. Since no unipot id is known for that particular gene, I left it NA.

Now you are ready to analyze your data!

<<EDIT1: 2 weeks later (31 Aug 2022); the 'https-fix' has been implemented in release versions of pathview, i.e. version 1.36.1 and higher. Thus: there is no need anymore to install the development version!! <</EDIT>>

<<EDIT2: 2 weeks + day later (1 Sept 2022); Luo Weijun, the author of the package, has updated the korg object. THe 'hack' is thus not required anymore. See his post in this thread ("Please update to the latest version of pathview, either the release (1.36.1) or the devel (1.37.1). pathview now supports 8282 KEGG species or 1449 new species beyond 2020. "ppyr" is one of them.") <</EDIT2>>

[[NOT NEEDED ANYMORE]] However, one complicating thing to deal with: the fact that KEGG switched to https connections (from http) in June 2022. This change has meanwhile been addressed in the pathview code, but AFAIK it did not get in release (yet?). Therefore I install the dev version of pathview first. [[END: NOT NEEDED ANYMORE]]

In a new, fresh R session:

> # devtools::install_github("datapplab/pathview", force=TRUE) #not applicable anymore from version 1.36.1 and higher. See above!
>
> # load library
> library(pathview)
> # generate some sample ppyr-specific input data. I selected 16 ids from the TCA cycle pathway (map ppyr00020).
> # also add random logFC for visualization
> ppyr.ids <- c("116170300", "116167655", "116177965", "116172794", "116171160", "116175872",
+               "116176593", "116164618", "116165686", "116175697", "116172860", "116169310",
+               "116169326", "116168556", "116168785", "116165354")
> data.logFC <- runif(n = length(ppyr.ids), min = -5, max = 5)
> names(data.logFC) <- ppyr.ids
> data.logFC[1:5]
 116170300  116167655  116177965  116172794  116171160 
 2.3525920 -0.7268648  3.6781704  0.4266744 -2.5919095 
>
> # now perform the 'hack'; replace korg. Note: **not** needed when using release > v1.36.1; see above.
> korg <- cbind("ktax.id" = "T07438", "tax.id" = "7054", "kegg.code" = "ppyr",
+               "scientific.name" = "Photinus pyralis", "common.name" = "common eastern firefly",
+               "entrez.gnodes" = "1", "kegg.geneid" = "116164712", "ncbi.geneid" = "116164712",
+               "ncbi.proteinid" = "XP_031334780", "uniprot" = NA)
> 
> # create the map of the TCA cycle!
> pv.out <- pathview(gene.data =data.logFC, pathway.id = "00020",
+                    species = "ppyr", out.suffix = "Photinus.pyralis",
+                    kegg.native = TRUE)
Warning: No annotation package for the species ppyr, gene symbols not mapped!
Info: Working in directory E:/000test
Info: Writing image file ppyr00020.Photinus.pyralis.png
> 
# The warning indicates that the enzyme codes on the KEGG map cannot be replaced
# by gene symbols because no annotation library for ppyr is installed (and is also not available at Bioconductor).
>
#DONE!

The output: enter image description here

> sessionInfo()
R version 4.2.0 Patched (2022-05-12 r82348 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pathview_1.31.3
ADD REPLY
0
Entering edit mode

Hello! Thank you so much for your response, I really appreciate it! I will give this a shot and update how it works. Thanks again!

ADD REPLY
0
Entering edit mode

Thank you so much, I'm sorry for my delayed response. I really appreciate this explanation!

ADD REPLY

Login before adding your answer.

Traffic: 581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6