Question: KEGGREST mapping between genes and pathways
0
gravatar for zhang.jianhai
4 weeks ago by
zhang.jianhai0 wrote:

Hello,

I want to enrich KEGG pathways on my rice genes. I tried "clusterProfiler", but its input is entrezID and my gene ids are RAPIDs (Os02g0617800). I want to keep using the RAPIDs, so I have to write my own enrichment functions. To do this, I need the mapping between RAPIDs and pathways. How to get the mapping from KEGGRSET?

Regards.

ADD COMMENTlink modified 4 weeks ago by Gordon Smyth36k • written 4 weeks ago by zhang.jianhai0
Answer: KEGGREST mapping between genes and pathways
1
gravatar for Gordon Smyth
4 weeks ago by
Gordon Smyth36k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth36k wrote:

You don't need to convert your IDs. KEGG already provides a mapping between the Japanese rice genome pathways and RAPIDs.

For example, you can use the kegga function in the limma package with species.KEGG="dosa" and it will use RAPIDs directly. To see which RAPIDs KEGG is using, have a look at the gene to pathway annotation:

> library(limma)
> GK <- getGeneKEGGLinks(species.KEGG="dosa")
> head(GK)
           GeneID      PathwayID
1 Os01t0118000-01 path:dosa00010
2 Os01t0147900-01 path:dosa00010
3 Os01t0160100-01 path:dosa00010
4 Os01t0190400-01 path:dosa00010
5 Os01t0191700-01 path:dosa00010
6 Os01t0276700-01 path:dosa00010

Alternatively, to get NCBI Entrez Gene IDs instead:

> GK.Entrez <- getGeneKEGGLinks(species.KEGG="osa")
> head(GK.Entrez)
     GeneID     PathwayID
1 107275630 path:osa00010
2 107277365 path:osa00010
3   4324066 path:osa00010
4   4324263 path:osa00010
5   4324666 path:osa00010
6   4325027 path:osa00010
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Gordon Smyth36k

It is very useful, but why there is "-01" or "-00" at the end of gene IDs (e.g. Os01t0191700-01)?

ADD REPLYlink written 4 weeks ago by zhang.jianhai0

You'd have to ask KEGG rather than me. Presumably they are transcript version numbers. It might be fine to remove them.

ADD REPLYlink written 4 weeks ago by Gordon Smyth36k

I see. Thank you very much for reply!

ADD REPLYlink written 4 weeks ago by zhang.jianhai0

All the gene ids have a "t" (Os01t0118000) inside rather than a "g" (Os05g0532600). Why is that?

ADD REPLYlink written 4 weeks ago by zhang.jianhai0

The ID with "t" is the transcript ID. The ID with "g" is the locus ID. See for example the gene annotation file you can download from here:

https://rapdb.dna.affrc.go.jp/download/irgsp1.html

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Gordon Smyth36k

I am working on locus IDs, to convert transcript ids to locus ids, should I replace "t" with "g" (based on your link, it seems yes)?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by zhang.jianhai0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 190 users visited in the last hour