Question

Can't find gene list for KEGG genesets

0

Entering edit mode

sup230 ▴ 30

@sup230-13286

Last seen 6.6 years ago

Hi,

I used both gage package in R and GSEA software for KEGG pathway analysis and since the 2 tools have different algorithms, I got slightly different results in the list of genesets. I was able to find the genes in each gene set from GSEA from their website, but I don't seem to find the gene list of some gene sets that I got as a result of GAGE analysis on GSEA website. A couple of those are:

hsa04659 Th17 cell differentiation

hsa04380 Osteoclast differentiation

I could not find these pathways in GSEA:C2:CP:KEGG collection. I did find the list of genes on KEGG website(http://www.genome.jp/dbget-bin/www_bget?hsa04659), but I would like to download this list. Does anyone know how to get the list of genes in these genesets? Is there an R packages that allows you to download genes in a geneset from KEGG database?

Thanks!

kegg geneset genelist gage package • 1.7k views

ADD COMMENT • link updated 6.7 years ago by Gordon Smyth 50k • written 6.7 years ago by sup230 ▴ 30

score 1 · Answer 1 · 2017-08-16

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

library(limma)
df <- getGeneKEGGLinks(species = "hsa")

will give you a data.frame of genes belonging to all KEGG pathways. See help("kegga").

ADD COMMENT • link 6.7 years ago Gordon Smyth 50k

score 0 · Answer 2 · 2017-08-16

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 10 hours ago

United States

There might be a simpler way to do this, but a quick hack is

> z <- keggGet("hsa04659")
> zz <- matrix(z[[1]]$GENE, ncol = 2, byrow = TRUE)
> head(zz)
     [,1]  
[1,] "3553"
[2,] "3554"
[3,] "3556"
[4,] "5600"
[5,] "6300"
[6,] "5603"
     [,2]                                                                    
[1,] "IL1B; interleukin 1 beta [KO:K04519]"                                  
[2,] "IL1R1; interleukin 1 receptor type 1 [KO:K04386]"                      
[3,] "IL1RAP; interleukin 1 receptor accessory protein [KO:K04723]"          
[4,] "MAPK11; mitogen-activated protein kinase 11 [KO:K04441] [EC:2.7.11.24]"
[5,] "MAPK12; mitogen-activated protein kinase 12 [KO:K04441] [EC:2.7.11.24]"
[6,] "MAPK13; mitogen-activated protein kinase 13 [KO:K04441] [EC:2.7.11.24]"

ADD COMMENT • link 6.7 years ago James W. MacDonald 65k

0

Entering edit mode

Thank you- This worked!

Could you also help me with matching the KEGG gene IDs to either gene symbols, Entrez ID, or Ensembl IDs?

ADD REPLY • link 6.7 years ago sup230 ▴ 30

1

Entering edit mode

The first column in the matrix contains the Entrez Gene ID, and the second contains the gene symbol, plus the gene name. You can also use the org.Hs.eg.db package to annotate if you want an arguably cleaner output.

> select(org.Hs.eg.db, head(zz[,1]), c("SYMBOL","ENSEMBL"))
'select()' returned 1:1 mapping between keys and columns
  ENTREZID SYMBOL         ENSEMBL
1     3553   IL1B ENSG00000125538
2     3554  IL1R1 ENSG00000115594
3     3556 IL1RAP ENSG00000196083
4     5600 MAPK11 ENSG00000185386
5     6300 MAPK12 ENSG00000188130
6     5603 MAPK13 ENSG00000156711

Which you can note is the same (except for the Ensembl ID) as what you already have

> data.frame(ENTREZID = head(zz[,1]), SYMBOL = sapply(strsplit(head(zz[,2]), ";"), "[", 1))
  ENTREZID SYMBOL
1     3553   IL1B
2     3554  IL1R1
3     3556 IL1RAP
4     5600 MAPK11
5     6300 MAPK12
6     5603 MAPK13

ADD REPLY • link 6.7 years ago James W. MacDonald 65k