Question: KEGGREST mapping between genes and pathways
0
gravatar for zhang.jianhai
3 months ago by
zhang.jianhai0 wrote:

Hello,

I want to enrich KEGG pathways on my rice genes. I tried "clusterProfiler", but its input is entrezID and my gene ids are RAPIDs (Os02g0617800). I want to keep using the RAPIDs, so I have to write my own enrichment functions. To do this, I need the mapping between RAPIDs and pathways. How to get the mapping from KEGGRSET?

Regards.

ADD COMMENTlink modified 3 months ago by Gordon Smyth37k • written 3 months ago by zhang.jianhai0
Answer: KEGGREST mapping between genes and pathways
1
gravatar for Gordon Smyth
3 months ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

You don't need to convert your IDs. KEGG already provides a mapping between the Japanese rice genome pathways and RAPIDs.

For example, you can use the kegga function in the limma package with species.KEGG="dosa" and it will use RAPIDs directly. To see which RAPIDs KEGG is using, have a look at the gene to pathway annotation:

> library(limma)
> GK <- getGeneKEGGLinks(species.KEGG="dosa")
> head(GK)
           GeneID      PathwayID
1 Os01t0118000-01 path:dosa00010
2 Os01t0147900-01 path:dosa00010
3 Os01t0160100-01 path:dosa00010
4 Os01t0190400-01 path:dosa00010
5 Os01t0191700-01 path:dosa00010
6 Os01t0276700-01 path:dosa00010

Alternatively, to get NCBI Entrez Gene IDs instead:

> GK.Entrez <- getGeneKEGGLinks(species.KEGG="osa")
> head(GK.Entrez)
     GeneID     PathwayID
1 107275630 path:osa00010
2 107277365 path:osa00010
3   4324066 path:osa00010
4   4324263 path:osa00010
5   4324666 path:osa00010
6   4325027 path:osa00010
ADD COMMENTlink modified 3 months ago • written 3 months ago by Gordon Smyth37k

It is very useful, but why there is "-01" or "-00" at the end of gene IDs (e.g. Os01t0191700-01)?

ADD REPLYlink written 3 months ago by zhang.jianhai0

You'd have to ask KEGG rather than me. Presumably they are transcript version numbers. It might be fine to remove them.

ADD REPLYlink written 3 months ago by Gordon Smyth37k

I see. Thank you very much for reply!

ADD REPLYlink written 3 months ago by zhang.jianhai0

All the gene ids have a "t" (Os01t0118000) inside rather than a "g" (Os05g0532600). Why is that?

ADD REPLYlink written 3 months ago by zhang.jianhai0

The ID with "t" is the transcript ID. The ID with "g" is the locus ID. See for example the gene annotation file you can download from here:

https://rapdb.dna.affrc.go.jp/download/irgsp1.html

ADD REPLYlink modified 3 months ago • written 3 months ago by Gordon Smyth37k

I am working on locus IDs, to convert transcript ids to locus ids, should I replace "t" with "g" (based on your link, it seems yes)?

ADD REPLYlink modified 3 months ago • written 3 months ago by zhang.jianhai0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 117 users visited in the last hour