Question: Can't find gene list for KEGG genesets
gravatar for sup230
3 months ago by
sup23010 wrote:


I used both gage package in R and GSEA software for KEGG pathway analysis and since the 2 tools have different algorithms, I got slightly different results in the list of genesets. I was able to find the genes in each gene set from GSEA from their website, but I don't seem to find the gene list of some gene sets that I got as a result of GAGE analysis on GSEA website. A couple of those are:

hsa04659 Th17 cell differentiation

hsa04380 Osteoclast differentiation 


I could not find these pathways in GSEA:C2:CP:KEGG collection. I did find the list of genes on KEGG website(, but I would like to download this list. Does anyone know how to get the list of genes in these genesets? Is there an R packages that allows you to download genes in a geneset from KEGG database? 




ADD COMMENTlink modified 3 months ago by Gordon Smyth32k • written 3 months ago by sup23010
gravatar for Gordon Smyth
3 months ago by
Gordon Smyth32k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth32k wrote:

df <- getGeneKEGGLinks(species = "hsa")

will give you a data.frame of genes belonging to all KEGG pathways. See help("kegga").

ADD COMMENTlink modified 3 months ago • written 3 months ago by Gordon Smyth32k
gravatar for James W. MacDonald
3 months ago by
United States
James W. MacDonald45k wrote:

There might be a simpler way to do this, but a quick hack is

> z <- keggGet("hsa04659")
> zz <- matrix(z[[1]]$GENE, ncol = 2, byrow = TRUE)
> head(zz)
[1,] "3553"
[2,] "3554"
[3,] "3556"
[4,] "5600"
[5,] "6300"
[6,] "5603"
[1,] "IL1B; interleukin 1 beta [KO:K04519]"                                  
[2,] "IL1R1; interleukin 1 receptor type 1 [KO:K04386]"                      
[3,] "IL1RAP; interleukin 1 receptor accessory protein [KO:K04723]"          
[4,] "MAPK11; mitogen-activated protein kinase 11 [KO:K04441] [EC:]"
[5,] "MAPK12; mitogen-activated protein kinase 12 [KO:K04441] [EC:]"
[6,] "MAPK13; mitogen-activated protein kinase 13 [KO:K04441] [EC:]"


ADD COMMENTlink written 3 months ago by James W. MacDonald45k

Thank you- This worked!

Could you also help me with matching the KEGG gene IDs to either gene symbols, Entrez ID, or Ensembl IDs?

ADD REPLYlink written 3 months ago by sup23010

The first column in the matrix contains the Entrez Gene ID, and the second contains the gene symbol, plus the gene name. You can also use the package to annotate if you want an arguably cleaner output.

> select(, head(zz[,1]), c("SYMBOL","ENSEMBL"))
'select()' returned 1:1 mapping between keys and columns
1     3553   IL1B ENSG00000125538
2     3554  IL1R1 ENSG00000115594
3     3556 IL1RAP ENSG00000196083
4     5600 MAPK11 ENSG00000185386
5     6300 MAPK12 ENSG00000188130
6     5603 MAPK13 ENSG00000156711

Which you can note is the same (except for the Ensembl ID) as what you already have

> data.frame(ENTREZID = head(zz[,1]), SYMBOL = sapply(strsplit(head(zz[,2]), ";"), "[", 1))
1     3553   IL1B
2     3554  IL1R1
3     3556 IL1RAP
4     5600 MAPK11
5     6300 MAPK12
6     5603 MAPK13
ADD REPLYlink written 3 months ago by James W. MacDonald45k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 297 users visited in the last hour