Hi
How can i get the list of genes for each KEGG pathways? I need a simple text table that in each row I have the KEGG pathway and next column has the list of genes for this specific pathway.
Best
Malik
Hi
How can i get the list of genes for each KEGG pathways? I need a simple text table that in each row I have the KEGG pathway and next column has the list of genes for this specific pathway.
Best
Malik
> library(limma) > tab <- getGeneKEGGLinks(species="hsa") > tab$Symbol <- mapIds(org.Hs.eg.db, tab$GeneID, column="SYMBOL", keytype="ENTREZID") > head(tab) GeneID PathwayID Symbol 1 10327 path:hsa00010 AKR1A1 2 124 path:hsa00010 ADH1A 3 125 path:hsa00010 ADH1B 4 126 path:hsa00010 ADH1C 5 127 path:hsa00010 ADH4 6 128 path:hsa00010 ADH5
To get names of the pathways:
> head(getKEGGPathwayNames(species="hsa")) PathwayID Description 1 path:hsa00010 Glycolysis / Gluconeogenesis - Homo sapiens (human) 2 path:hsa00020 Citrate cycle (TCA cycle) - Homo sapiens (human) 3 path:hsa00030 Pentose phosphate pathway - Homo sapiens (human) 4 path:hsa00040 Pentose and glucuronate interconversions - Homo sapiens (human) 5 path:hsa00051 Fructose and mannose metabolism - Homo sapiens (human) 6 path:hsa00052 Galactose metabolism - Homo sapiens (human)
Here I use KEGGREST for KEGG information, and org.Hs.eg.db for symbol mapping. The tidyverse is convenient for working with data.frames
library(KEGGREST)
library(org.Hs.eg.db)
library(tidyverse) ## dplyr::select() vs. AnnotationDbi::select() !
These are the KEGG pathways and their Entrez gene ids
hsa_path_eg <- keggLink("pathway", "hsa") %>%
tibble(pathway = ., eg = sub("hsa:", "", names(.)))
annotated with the SYMBOL and ENSEMBL identifiers associated with each Entrez id
hsa_kegg_anno <- hsa_path_eg %>%
mutate(
symbol = mapIds(org.Hs.eg.db, eg, "SYMBOL", "ENTREZID"),
ensembl = mapIds(org.Hs.eg.db, eg, "ENSEMBL", "ENTREZID")
)
This gives me
> hsa_kegg_anno
# A tibble: 29,424 x 4
pathway eg symbol ensembl
<chr> <chr> <chr> <chr>
1 path:hsa00010 10327 AKR1A1 ENSG00000117448
2 path:hsa00010 124 ADH1A ENSG00000187758
3 path:hsa00010 125 ADH1B ENSG00000196616
4 path:hsa00010 126 ADH1C ENSG00000248144
5 path:hsa00010 127 ADH4 ENSG00000198099
6 path:hsa00010 128 ADH5 ENSG00000197894
7 path:hsa00010 130 ADH6 ENSG00000172955
8 path:hsa00010 130589 GALM ENSG00000143891
9 path:hsa00010 131 ADH7 ENSG00000196344
10 path:hsa00010 160287 LDHAL6A ENSG00000166800
# ... with 29,414 more rows
I can go back to KEGG for the pathway descriptions
hsa_pathways <- keggList("pathway", "hsa") %>%
tibble(pathway = names(.), description = .)
so
> hsa_pathways
# A tibble: 328 x 2
pathway description
<chr> <chr>
1 path:hsa00010 Glycolysis / Gluconeogenesis - Homo sapiens (human)
2 path:hsa00020 Citrate cycle (TCA cycle) - Homo sapiens (human)
3 path:hsa00030 Pentose phosphate pathway - Homo sapiens (human)
4 path:hsa00040 Pentose and glucuronate interconversions - Homo sapiens (huma…
5 path:hsa00051 Fructose and mannose metabolism - Homo sapiens (human)
6 path:hsa00052 Galactose metabolism - Homo sapiens (human)
7 path:hsa00053 Ascorbate and aldarate metabolism - Homo sapiens (human)
8 path:hsa00061 Fatty acid biosynthesis - Homo sapiens (human)
9 path:hsa00062 Fatty acid elongation - Homo sapiens (human)
10 path:hsa00071 Fatty acid degradation - Homo sapiens (human)
# ... with 318 more rows
I could join these with the gene identifiers, if desired...
left_join(hsa_kegg_anno, hsa_pathways)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks for your reply.
How can i get gene symbol to match it for GeneId and also Kegg pathway name?
If you have two data.frames with the same things in both, it's trivial to match those up. See e.g., ?match