Question

KEGGREST mapping between genes and pathways

0

Entering edit mode

zhang.jianhai ▴ 10

@zhangjianhai-12955

Last seen 6.1 years ago

Hello,

I want to enrich KEGG pathways on my rice genes. I tried "clusterProfiler", but its input is entrezID and my gene ids are RAPIDs (Os02g0617800). I want to keep using the RAPIDs, so I have to write my own enrichment functions. To do this, I need the mapping between RAPIDs and pathways. How to get the mapping from KEGGRSET?

Regards.

keggrest mapping genes to pathways • 2.9k views

ADD COMMENT • link updated 7.0 years ago by Gordon Smyth 53k • written 7.0 years ago by zhang.jianhai ▴ 10

score 1 · Answer 1 · 2019-02-17

1

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

You don't need to convert your IDs. KEGG already provides a mapping between the Japanese rice genome pathways and RAPIDs.

For example, you can use the kegga function in the limma package with species.KEGG="dosa" and it will use RAPIDs directly. To see which RAPIDs KEGG is using, have a look at the gene to pathway annotation:

> library(limma)
> GK <- getGeneKEGGLinks(species.KEGG="dosa")
> head(GK)
           GeneID      PathwayID
1 Os01t0118000-01 path:dosa00010
2 Os01t0147900-01 path:dosa00010
3 Os01t0160100-01 path:dosa00010
4 Os01t0190400-01 path:dosa00010
5 Os01t0191700-01 path:dosa00010
6 Os01t0276700-01 path:dosa00010

Alternatively, to get NCBI Entrez Gene IDs instead:

> GK.Entrez <- getGeneKEGGLinks(species.KEGG="osa")
> head(GK.Entrez)
     GeneID     PathwayID
1 107275630 path:osa00010
2 107277365 path:osa00010
3   4324066 path:osa00010
4   4324263 path:osa00010
5   4324666 path:osa00010
6   4325027 path:osa00010

ADD COMMENT • link 7.0 years ago Gordon Smyth 53k

0

Entering edit mode

It is very useful, but why there is "-01" or "-00" at the end of gene IDs (e.g. Os01t0191700-01)?

ADD REPLY • link 7.0 years ago zhang.jianhai ▴ 10

0

Entering edit mode

You'd have to ask KEGG rather than me. Presumably they are transcript version numbers. It might be fine to remove them.

ADD REPLY • link 7.0 years ago Gordon Smyth 53k

0

Entering edit mode

I see. Thank you very much for reply!

ADD REPLY • link 7.0 years ago zhang.jianhai ▴ 10

0

Entering edit mode

All the gene ids have a "t" (Os01t0118000) inside rather than a "g" (Os05g0532600). Why is that?

ADD REPLY • link 7.0 years ago zhang.jianhai ▴ 10

0

Entering edit mode

The ID with "t" is the transcript ID. The ID with "g" is the locus ID. See for example the gene annotation file you can download from here:

https://rapdb.dna.affrc.go.jp/download/irgsp1.html

ADD REPLY • link 7.0 years ago Gordon Smyth 53k

0

Entering edit mode

I am working on locus IDs, to convert transcript ids to locus ids, should I replace "t" with "g" (based on your link, it seems yes)?

ADD REPLY • link 7.0 years ago zhang.jianhai ▴ 10

0

Entering edit mode

It seems 50 transcript ids resulting from "getGeneKEGGLinks" are not present int the link you shared. Do you know the source of transcript IDs in "limma"?

ADD REPLY • link 6.2 years ago zhang.jianhai ▴ 10