Many NAs using TAIR IDs for plotting KEGG pathways using pathview
1
0
Entering edit mode
Arctic • 0
@arctic-22506
Last seen 6 weeks ago
United States

Dear all,

Fairly new to using pathview package (v1.34.00) and plotting KEGG pathways. I get a high fraction ( ~30%-50%) of genes plotted as NA when plotting KEGG pathways using TAIR or ENTREZ IDs. However when I check the KEGG page for the genes it appears that they have comparable TAIR IDs. For instance:

# KEGG "Phtosynthesis" pathway [ath00195]:

## 1. Plotting gene psbA using KEGG ID:

library(pathview)

ath00195 <- pathview(gene.data = c("ArthCp002"), pathway.id = "ath00195", species = "ath", gene.idtype = "KEGG", na.col = "purple" )

## 2. Plot gene psbA using TAIR ID

ath00195 <- pathview(gene.data = c("ATCG00020"), pathway.id = "ath00195", species = "ath", gene.idtype = "TAIR", na.col = "purple" )

## 3. KEGG page for psbA appears to list ATCG00020 as its TAIR ID:

https://www.genome.jp/dbget-bin/www_bget?ath:ArthCp002

KEGG TAIR_IDs KEGGdzPathwaysGEO pathview • 260 views
0
Entering edit mode
@james-w-macdonald-5106
Last seen 6 hours ago
United States

If you provide pathview with a TAIR ID, it will convert to NCBI Gene IDs, which are the main IDs used by KEGG. Unfortunately there isn't a mapping for that ID

> select(org.At.tair.db, "ATCG00020", "ENTREZID", "TAIR")
'select()' returned 1:1 mapping between keys and columns
TAIR ENTREZID
1 ATCG00020     <NA>


The page for that gene on arabidopsis.org doesn't appear to provide an NCBI Gene ID, and searching at NCBI returns nothing as well, so it appears not to have an NCBI Gene ID.

0
Entering edit mode

Hello James thank you for your reply and apologies for the delayed reply on my behalf. I can follow your explanation that in this example the key conversion fails. But would not the NCBI gene ID listed in KEGG page (Ex. here 844802) be the ID we are looking for? In other words is this a dictionary update issue or there are other factors in play? Does not KEGG provide dictionaries that can be used for this conversion? Thanks again,

1
Entering edit mode

You can get the mapping from KEGG, and perhaps that's how pathview should do it. But for now it uses the org.At.tair.db package, which is built using data we can download from arabidopsis.org. And if you go to arabidopsis.org and search on that ID, there doesn't appear to be an NCBI Gene ID listed. It may be that KEGG maps the TAIR ID to UniProt and then to NCBI Gene ID, but that is way more complicated than we have the bandwidth to attempt. As it stands, generating the annotation packages the way we do right now is somewhere around 80 hours of work, and it's hard to come by the FTE to do that right before each release, which is a busy time already.