Search
Question: How i can get a list of KEGG pathways and its list of genes?
1
gravatar for malik.yousef
7 days ago by
malik.yousef10
malik.yousef10 wrote:

Hi

How can i get the list of genes for each KEGG pathways? I need a simple text table that in each row I have the KEGG pathway and next column has the list of genes for this specific pathway.

Best

Malik

ADD COMMENTlink modified 5 days ago by Martin Morgan ♦♦ 21k • written 7 days ago by malik.yousef10
1
gravatar for Martin Morgan
5 days ago by
Martin Morgan ♦♦ 21k
United States
Martin Morgan ♦♦ 21k wrote:

Here I use KEGGREST for KEGG information, and org.Hs.eg.db for symbol mapping. The tidyverse is convenient for working with data.frames

library(KEGGREST)
library(org.Hs.eg.db)
library(tidyverse)     ## dplyr::select() vs. AnnotationDbi::select() !

These are the KEGG pathways and their Entrez gene ids

hsa_path_eg  <- keggLink("pathway", "hsa") %>% 
    tibble(pathway = ., eg = sub("hsa:", "", names(.)))

annotated with the SYMBOL and ENSEMBL identifiers associated with each Entrez id

hsa_kegg_anno <- hsa_path_eg %>%
    mutate(
        symbol = mapIds(org.Hs.eg.db, eg, "SYMBOL", "ENTREZID"),
        ensembl = mapIds(org.Hs.eg.db, eg, "ENSEMBL", "ENTREZID")
    )

This gives me

> hsa_kegg_anno
# A tibble: 29,424 x 4
   pathway       eg     symbol  ensembl        
   <chr>         <chr>  <chr>   <chr>          
 1 path:hsa00010 10327  AKR1A1  ENSG00000117448
 2 path:hsa00010 124    ADH1A   ENSG00000187758
 3 path:hsa00010 125    ADH1B   ENSG00000196616
 4 path:hsa00010 126    ADH1C   ENSG00000248144
 5 path:hsa00010 127    ADH4    ENSG00000198099
 6 path:hsa00010 128    ADH5    ENSG00000197894
 7 path:hsa00010 130    ADH6    ENSG00000172955
 8 path:hsa00010 130589 GALM    ENSG00000143891
 9 path:hsa00010 131    ADH7    ENSG00000196344
10 path:hsa00010 160287 LDHAL6A ENSG00000166800
# ... with 29,414 more rows

I can go back to KEGG for the pathway descriptions

hsa_pathways <- keggList("pathway", "hsa") %>% 
    tibble(pathway = names(.), description = .)

so

> hsa_pathways
# A tibble: 328 x 2
   pathway       description                                                   
   <chr>         <chr>                                                         
 1 path:hsa00010 Glycolysis / Gluconeogenesis - Homo sapiens (human)           
 2 path:hsa00020 Citrate cycle (TCA cycle) - Homo sapiens (human)              
 3 path:hsa00030 Pentose phosphate pathway - Homo sapiens (human)              
 4 path:hsa00040 Pentose and glucuronate interconversions - Homo sapiens (huma…
 5 path:hsa00051 Fructose and mannose metabolism - Homo sapiens (human)        
 6 path:hsa00052 Galactose metabolism - Homo sapiens (human)                   
 7 path:hsa00053 Ascorbate and aldarate metabolism - Homo sapiens (human)      
 8 path:hsa00061 Fatty acid biosynthesis - Homo sapiens (human)                
 9 path:hsa00062 Fatty acid elongation - Homo sapiens (human)                  
10 path:hsa00071 Fatty acid degradation - Homo sapiens (human)                 
# ... with 318 more rows

I could join these with the gene identifiers, if desired...

left_join(hsa_kegg_anno, hsa_pathways)
ADD COMMENTlink modified 5 days ago • written 5 days ago by Martin Morgan ♦♦ 21k
0
gravatar for Gordon Smyth
7 days ago by
Gordon Smyth33k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth33k wrote:
> library(limma)
> tab <- getGeneKEGGLinks(species="hsa")
> tab$Symbol <- mapIds(org.Hs.eg.db, tab$GeneID,
                       column="SYMBOL", keytype="ENTREZID")
> head(tab)
  GeneID     PathwayID Symbol
1  10327 path:hsa00010 AKR1A1
2    124 path:hsa00010  ADH1A
3    125 path:hsa00010  ADH1B
4    126 path:hsa00010  ADH1C
5    127 path:hsa00010   ADH4
6    128 path:hsa00010   ADH5

To get names of the pathways:

> head(getKEGGPathwayNames(species="hsa"))
      PathwayID                                                     Description
1 path:hsa00010             Glycolysis / Gluconeogenesis - Homo sapiens (human)
2 path:hsa00020                Citrate cycle (TCA cycle) - Homo sapiens (human)
3 path:hsa00030                Pentose phosphate pathway - Homo sapiens (human)
4 path:hsa00040 Pentose and glucuronate interconversions - Homo sapiens (human)
5 path:hsa00051          Fructose and mannose metabolism - Homo sapiens (human)
6 path:hsa00052                     Galactose metabolism - Homo sapiens (human)

 

ADD COMMENTlink modified 7 days ago • written 7 days ago by Gordon Smyth33k

Thanks for your reply.

How can i get gene symbol to match it for GeneId and also Kegg pathway name?

ADD REPLYlink written 7 days ago by malik.yousef10

If you have two data.frames with the same things in both, it's trivial to match those up. See e.g., ?match

ADD REPLYlink written 6 days ago by James W. MacDonald46k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 196 users visited in the last hour