Question

How i can get a list of KEGG pathways and its list of genes?

1

Entering edit mode

malik.yousef ▴ 10

@malikyousef-16077

Last seen 5.8 years ago

Hi

How can i get the list of genes for each KEGG pathways? I need a simple text table that in each row I have the KEGG pathway and next column has the list of genes for this specific pathway.

Best

Malik

keggrest • 12k views

ADD COMMENT • link updated 6.7 years ago by Martin Morgan 25k • written 6.8 years ago by malik.yousef ▴ 10

score 1 · Answer 1 · 2018-06-10

> library(limma)
> tab <- getGeneKEGGLinks(species="hsa")
> tab$Symbol <- mapIds(org.Hs.eg.db, tab$GeneID,
                       column="SYMBOL", keytype="ENTREZID")
> head(tab)
  GeneID     PathwayID Symbol
1  10327 path:hsa00010 AKR1A1
2    124 path:hsa00010  ADH1A
3    125 path:hsa00010  ADH1B
4    126 path:hsa00010  ADH1C
5    127 path:hsa00010   ADH4
6    128 path:hsa00010   ADH5

To get names of the pathways:

> head(getKEGGPathwayNames(species="hsa"))
      PathwayID                                                     Description
1 path:hsa00010             Glycolysis / Gluconeogenesis - Homo sapiens (human)
2 path:hsa00020                Citrate cycle (TCA cycle) - Homo sapiens (human)
3 path:hsa00030                Pentose phosphate pathway - Homo sapiens (human)
4 path:hsa00040 Pentose and glucuronate interconversions - Homo sapiens (human)
5 path:hsa00051          Fructose and mannose metabolism - Homo sapiens (human)
6 path:hsa00052                     Galactose metabolism - Homo sapiens (human)

score 1 · Answer 2 · 2018-06-12

Here I use KEGGREST for KEGG information, and org.Hs.eg.db for symbol mapping. The tidyverse is convenient for working with data.frames

library(KEGGREST)
library(org.Hs.eg.db)
library(tidyverse)     ## dplyr::select() vs. AnnotationDbi::select() !

These are the KEGG pathways and their Entrez gene ids

hsa_path_eg  <- keggLink("pathway", "hsa") %>% 
    tibble(pathway = ., eg = sub("hsa:", "", names(.)))

annotated with the SYMBOL and ENSEMBL identifiers associated with each Entrez id

hsa_kegg_anno <- hsa_path_eg %>%
    mutate(
        symbol = mapIds(org.Hs.eg.db, eg, "SYMBOL", "ENTREZID"),
        ensembl = mapIds(org.Hs.eg.db, eg, "ENSEMBL", "ENTREZID")
    )

This gives me

> hsa_kegg_anno
# A tibble: 29,424 x 4
   pathway       eg     symbol  ensembl        
   <chr>         <chr>  <chr>   <chr>          
 1 path:hsa00010 10327  AKR1A1  ENSG00000117448
 2 path:hsa00010 124    ADH1A   ENSG00000187758
 3 path:hsa00010 125    ADH1B   ENSG00000196616
 4 path:hsa00010 126    ADH1C   ENSG00000248144
 5 path:hsa00010 127    ADH4    ENSG00000198099
 6 path:hsa00010 128    ADH5    ENSG00000197894
 7 path:hsa00010 130    ADH6    ENSG00000172955
 8 path:hsa00010 130589 GALM    ENSG00000143891
 9 path:hsa00010 131    ADH7    ENSG00000196344
10 path:hsa00010 160287 LDHAL6A ENSG00000166800
# ... with 29,414 more rows

I can go back to KEGG for the pathway descriptions

hsa_pathways <- keggList("pathway", "hsa") %>% 
    tibble(pathway = names(.), description = .)

so

> hsa_pathways
# A tibble: 328 x 2
   pathway       description                                                   
   <chr>         <chr>                                                         
 1 path:hsa00010 Glycolysis / Gluconeogenesis - Homo sapiens (human)           
 2 path:hsa00020 Citrate cycle (TCA cycle) - Homo sapiens (human)              
 3 path:hsa00030 Pentose phosphate pathway - Homo sapiens (human)              
 4 path:hsa00040 Pentose and glucuronate interconversions - Homo sapiens (huma…
 5 path:hsa00051 Fructose and mannose metabolism - Homo sapiens (human)        
 6 path:hsa00052 Galactose metabolism - Homo sapiens (human)                   
 7 path:hsa00053 Ascorbate and aldarate metabolism - Homo sapiens (human)      
 8 path:hsa00061 Fatty acid biosynthesis - Homo sapiens (human)                
 9 path:hsa00062 Fatty acid elongation - Homo sapiens (human)                  
10 path:hsa00071 Fatty acid degradation - Homo sapiens (human)                 
# ... with 318 more rows

I could join these with the gene identifiers, if desired...

left_join(hsa_kegg_anno, hsa_pathways)