Can't find gene list for KEGG genesets
2
0
Entering edit mode
sup230 ▴ 30
@sup230-13286
Last seen 7.2 years ago

Hi,

I used both gage package in R and GSEA software for KEGG pathway analysis and since the 2 tools have different algorithms, I got slightly different results in the list of genesets. I was able to find the genes in each gene set from GSEA from their website, but I don't seem to find the gene list of some gene sets that I got as a result of GAGE analysis on GSEA website. A couple of those are:

hsa04659 Th17 cell differentiation

hsa04380 Osteoclast differentiation 

 

I could not find these pathways in GSEA:C2:CP:KEGG collection. I did find the list of genes on KEGG website(http://www.genome.jp/dbget-bin/www_bget?hsa04659), but I would like to download this list. Does anyone know how to get the list of genes in these genesets? Is there an R packages that allows you to download genes in a geneset from KEGG database? 

Thanks!

 

 

kegg geneset genelist gage package • 2.2k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 5 hours ago
WEHI, Melbourne, Australia

library(limma)
df <- getGeneKEGGLinks(species = "hsa")

will give you a data.frame of genes belonging to all KEGG pathways. See help("kegga").

ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

There might be a simpler way to do this, but a quick hack is

> z <- keggGet("hsa04659")
> zz <- matrix(z[[1]]$GENE, ncol = 2, byrow = TRUE)
> head(zz)
     [,1]  
[1,] "3553"
[2,] "3554"
[3,] "3556"
[4,] "5600"
[5,] "6300"
[6,] "5603"
     [,2]                                                                    
[1,] "IL1B; interleukin 1 beta [KO:K04519]"                                  
[2,] "IL1R1; interleukin 1 receptor type 1 [KO:K04386]"                      
[3,] "IL1RAP; interleukin 1 receptor accessory protein [KO:K04723]"          
[4,] "MAPK11; mitogen-activated protein kinase 11 [KO:K04441] [EC:2.7.11.24]"
[5,] "MAPK12; mitogen-activated protein kinase 12 [KO:K04441] [EC:2.7.11.24]"
[6,] "MAPK13; mitogen-activated protein kinase 13 [KO:K04441] [EC:2.7.11.24]"

 

ADD COMMENT
0
Entering edit mode

Thank you- This worked!

Could you also help me with matching the KEGG gene IDs to either gene symbols, Entrez ID, or Ensembl IDs?

ADD REPLY
1
Entering edit mode

The first column in the matrix contains the Entrez Gene ID, and the second contains the gene symbol, plus the gene name. You can also use the org.Hs.eg.db package to annotate if you want an arguably cleaner output.

> select(org.Hs.eg.db, head(zz[,1]), c("SYMBOL","ENSEMBL"))
'select()' returned 1:1 mapping between keys and columns
  ENTREZID SYMBOL         ENSEMBL
1     3553   IL1B ENSG00000125538
2     3554  IL1R1 ENSG00000115594
3     3556 IL1RAP ENSG00000196083
4     5600 MAPK11 ENSG00000185386
5     6300 MAPK12 ENSG00000188130
6     5603 MAPK13 ENSG00000156711

Which you can note is the same (except for the Ensembl ID) as what you already have

> data.frame(ENTREZID = head(zz[,1]), SYMBOL = sapply(strsplit(head(zz[,2]), ";"), "[", 1))
  ENTREZID SYMBOL
1     3553   IL1B
2     3554  IL1R1
3     3556 IL1RAP
4     5600 MAPK11
5     6300 MAPK12
6     5603 MAPK13
ADD REPLY

Login before adding your answer.

Traffic: 529 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6