Question

extracting the KEGG pathway for a set of genes

0

Entering edit mode

Bogdan ▴ 670

@bogdan-2367

Last seen 6 months ago

Palo Alto, CA, USA

Dear all, please would you advise :

given a set of gene names, what is the best way to extract the KEGG pathway that is associated with each gene ?

thank you,

-- bogdan

Pathways KEGG • 3.7k views

ADD COMMENT • link 2.7 years ago Bogdan ▴ 670

score 0 · Answer 1 · 2021-08-04

0

Entering edit mode

Kevin Blighe ★ 3.9k

@kevin

Last seen 7 hours ago

Republic of Ireland

Hi Bogdan,

I use KEGGprofile. The main function in KEGGprofile is find_enriched_pathway(), and it by default accepts a vector of Entrez gene IDs. However, the original annotation package used by KEGGprofile (KEGG.db) was deprecated in Bioconductor 3.12.

Another popular option is, of course, clusterProfiler: http://yulab-smu.top/clusterProfiler-book/chapter6.html

Kevin

ADD COMMENT • link 2.7 years ago Kevin Blighe ★ 3.9k

0

Entering edit mode

Hi Kevin, For some strange reason KEGGprofile is not available for the latest version of R.

install.packages("KEGGprofile") Warning in install.packages : package ‘KEGGprofile’ is not available for this version of R

A version of this package for your version of R might be available elsewhere, see the ideas at https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

When I tried up looking in cran:

av <- available.packages(filters=list()) av[av[, "Package"] == "KEGGprofile", ] Package Version Priority Depends Imports LinkingTo Suggests Enhances License License_is_FOSS License_restricts_use OS_type Archs MD5sum NeedsCompilation File Repository

I am getting the above mentioned reason. Please do let me know how can i resolve this.

ADD REPLY • link 2.7 years ago mshubham • 0

0

Entering edit mode

Please re-read Kevin's reply. The KEGG.db package has been deprecated because it used data from over 6 years ago (before the paywall for KEGG data was established). For whatever reason the maintainer for KEGGprofile has not updated their package to use alternative methods.

There is the KEGGREST package, but for simple queries it is pretty cumbersome. I tend to just get the mappings directly using functions from limma and then proceed from there. You don't say the species, so I will imagine you want human

> library(limma)
> z <- getGeneKEGGLinks("hsa")
> head(z)
  GeneID     PathwayID
1  10327 path:hsa00010
2    124 path:hsa00010
3    125 path:hsa00010
4    126 path:hsa00010
5    127 path:hsa00010
6    128 path:hsa00010

> zlst <- split(z[,2], z[,1])
> zlst[1:5]
$`10`
[1] "path:hsa00232" "path:hsa00983" "path:hsa01100" "path:hsa05204"

$`100`
[1] "path:hsa00230" "path:hsa01100" "path:hsa05340"

$`1000`
[1] "path:hsa04514" "path:hsa05412"

$`10000`
 [1] "path:hsa01521" "path:hsa01522" "path:hsa01524" "path:hsa04010"
 [5] "path:hsa04012" "path:hsa04014" "path:hsa04015" "path:hsa04022"
 [9] "path:hsa04024" "path:hsa04062" "path:hsa04066" "path:hsa04068"
[13] "path:hsa04071" "path:hsa04072" "path:hsa04140" "path:hsa04150"
[17] "path:hsa04151" "path:hsa04152" "path:hsa04210" "path:hsa04211"
[21] "path:hsa04213" "path:hsa04218" "path:hsa04261" "path:hsa04370"
[25] "path:hsa04371" "path:hsa04380" "path:hsa04510" "path:hsa04550"
[29] "path:hsa04611" "path:hsa04613" "path:hsa04620" "path:hsa04625"
[33] "path:hsa04630" "path:hsa04660" "path:hsa04662" "path:hsa04664"
[37] "path:hsa04666" "path:hsa04668" "path:hsa04722" "path:hsa04725"
[41] "path:hsa04728" "path:hsa04910" "path:hsa04914" "path:hsa04915"
[45] "path:hsa04917" "path:hsa04919" "path:hsa04920" "path:hsa04922"
[49] "path:hsa04923" "path:hsa04926" "path:hsa04929" "path:hsa04931"
[53] "path:hsa04932" "path:hsa04933" "path:hsa04935" "path:hsa04973"
[57] "path:hsa05010" "path:hsa05017" "path:hsa05131" "path:hsa05132"
[61] "path:hsa05135" "path:hsa05142" "path:hsa05145" "path:hsa05152"
[65] "path:hsa05160" "path:hsa05161" "path:hsa05162" "path:hsa05163"
[69] "path:hsa05164" "path:hsa05165" "path:hsa05166" "path:hsa05167"
[73] "path:hsa05168" "path:hsa05169" "path:hsa05170" "path:hsa05200"
[77] "path:hsa05205" "path:hsa05207" "path:hsa05208" "path:hsa05210"
[81] "path:hsa05211" "path:hsa05212" "path:hsa05213" "path:hsa05214"
[85] "path:hsa05215" "path:hsa05218" "path:hsa05220" "path:hsa05221"
[89] "path:hsa05222" "path:hsa05223" "path:hsa05224" "path:hsa05225"
[93] "path:hsa05226" "path:hsa05230" "path:hsa05231" "path:hsa05235"
[97] "path:hsa05415" "path:hsa05417" "path:hsa05418"

$`100008587`
[1] "path:hsa03008" "path:hsa03010"

# And if you need the pathway names

> zz <- getKEGGPathwayNames("hsa")
> head(zz)
      PathwayID                                                     Description
1 path:hsa00010             Glycolysis / Gluconeogenesis - Homo sapiens (human)
2 path:hsa00020                Citrate cycle (TCA cycle) - Homo sapiens (human)
3 path:hsa00030                Pentose phosphate pathway - Homo sapiens (human)
4 path:hsa00040 Pentose and glucuronate interconversions - Homo sapiens (human)
5 path:hsa00051          Fructose and mannose metabolism - Homo sapiens (human)
6 path:hsa00052                     Galactose metabolism - Homo sapiens (human)

ADD REPLY • link 2.7 years ago James W. MacDonald 65k

0

Entering edit mode

Dear gentlemen, thank you for your replies. I have followed the part on assigning the pathway names to pathways ID;

however, given a gene, how shall I find the pathway that is associated to.

is there a way to use the gmt files for example ? or any other resources ?

ADD REPLY • link 2.7 years ago Bogdan ▴ 670

0

Entering edit mode

You can also use the kegg_pathway_annotations function from the OmnipathR package: https://saezlab.github.io/OmnipathR/reference/kegg_pathway_annotations.html

See more KEGG related functions here: https://saezlab.github.io/OmnipathR/reference/, all prefixed with kegg_