Search
Question: Gene-Metabolite and Metabolite-Pathway network from KEGG with KEGGREST
0
gravatar for hkarakurt
5 months ago by
hkarakurt10
hkarakurt10 wrote:

Hello, I need to build a Gene-Metabolite (Compund) and Metabolite - Pathway network for Streptomyces coelicolor from KEGG. I tried KEGGREST package for R but I only can download Gene - Pathway network. Anyone knows how can I download this network with KEGGREST or any other packages?

 

Thank you,

ADD COMMENTlink modified 5 months ago by 7kemZmani10 • written 5 months ago by hkarakurt10

in what format/structure you expect the output to be?

the API you're looking for is `keggLink`; but KEGG doesn't "link" genes with compounds directly.

what you can do is "link" pathways to compounds, and also link genes to pathways. From these two links you can derive the third gene-compounds link.

Is that what you want or you just want 2 data frames: gene->compund, and compund-> pathway?

ADD REPLYlink modified 5 months ago • written 5 months ago by 7kemZmani10

Yes these are the data frames I want.

Actually format is not really important but mostly it should be a table which includes a column for gene and a column with associated compound.

 

Thank you for your answer.

ADD REPLYlink modified 5 months ago • written 5 months ago by hkarakurt10
1
gravatar for 7kemZmani
5 months ago by
7kemZmani10
7kemZmani10 wrote:

in Kegg there's no direct gene to compound mapping, however you can have that mapping (indirectly) by running `keggLink` twice one to get gene-EC map, and another one to get EC-compound map, then merge by col to get gene-compound mapping.

Here's how to do it:

library("KEGGREST")

res1 = keggLink("enzyme", "sco")
tmpDF1 = data.frame(ec = res1, gene = names(res1))

# >head(tmpDF1)
#            ec        gene
#1  ec:4.2.1.51 sco:SCO3962
#2 ec:5.4.99.23 sco:SCO2073
#3  ec:2.4.1.18 sco:SCO5440
#4  ec:2.4.1.18 sco:SCO7332
#5  ec:2.7.1.28 sco:SCO0580
#6  ec:2.7.1.29 sco:SCO0580

res2 = keggLink("compound", "enzyme")
tmpDF2 = data.frame(cpd = res2, ec = names(res2))

#> head(tmpDF2)
#         cpd           ec
#1 cpd:C00001 ec:4.2.3.174
#2 cpd:C00001   ec:6.1.2.2
#3 cpd:C00001  ec:5.1.3.14
#4 cpd:C00001  ec:4.6.1.13
#5 cpd:C00001  ec:4.6.1.17
#6 cpd:C00001  ec:4.99.1.5

# merge by column
df = merge(tmpDF1, tmpDF2, by="ec")

#> head(df)
#          ec        gene        cpd
#1 ec:1.1.1.1 sco:SCO7362 cpd:C00001
#2 ec:1.1.1.1 sco:SCO7362 cpd:C16551
#3 ec:1.1.1.1 sco:SCO7362 cpd:C07490
#4 ec:1.1.1.1 sco:SCO7362 cpd:C02909
#5 ec:1.1.1.1 sco:SCO7362 cpd:C00004
#6 ec:1.1.1.1 sco:SCO7362 cpd:C06611

# you can also aggregate by gene as the key of your map
aggDF = aggregate(df['ec'], by = df['gene'], FUN=paste)

# or maintain the one-to-one mapping (keep redundant compounds and genes)
df = df[, c("gene", "cpd")]

 

*the second compound-pathway map is left to you as an exercise 

ADD COMMENTlink written 5 months ago by 7kemZmani10

'sco' is the organism code for Streptomyces coelicolor in KEGG

ADD REPLYlink written 5 months ago by 7kemZmani10

find out more about how to use KEGG API here:

http://www.kegg.jp/kegg/docs/keggapi.html

ADD REPLYlink written 5 months ago by 7kemZmani10
0
gravatar for AR3513
5 months ago by
AR35130
AR35130 wrote:

Hello, 

If you want to build a gene-compound network you can use the package MetaboSignal. See below an script: 

library(MetaboSignal)

paths_sco = MS_getPathIds(organism_code = "sco") # See all pathways for S.coelicolor
metabo_paths_sco = paths_sco[paths_sco[, "Path_type"] == "metabolic", 1] # Get metabo pathway IDs

network = MS_keggNetwork(metabo_paths = metabo_paths_sco, expand_genes = TRUE) # If you want the genes to be clustered into orthologs use expand_genes = FALSE

The network is formatted as a 3-column matrix. The first two columns are the edge list and the third one corresponds to the direction of the reaction (i.e. reversible or irreversible)

Hope this helps,

Andrea

ADD COMMENTlink written 5 months ago by AR35130

I will check the package.

Thank you so much.

ADD REPLYlink written 5 months ago by hkarakurt10

I just checked and MS_getPathIds command is not working. It is in reference manual but R cannot find the function.

ADD REPLYlink written 5 months ago by hkarakurt10

Hello, 

This is a bit weird because I just tried myself and it's working for me.... Are you sure you have installed and loaded the package? Make sure you haven't missed any of the following steps:

## Install package

source("https://bioconductor.org/biocLite.R")

biocLite("MetaboSignal")

## Load package

library(MetaboSignal)

## Get help for the function MS_getPathIds

help(MS_getPathIds) ## This should should you the documentation of the function 

## Then you can get the pathway ids of your organism of interest, which in this case seems to be S.coelicolor, so organism_code = "sco". Based on this do:

paths_sco = MS_getPathIds(organism_code = "sco") ## This should show you all the KEGG pathways available for your organism of interest. You can then use all (or some) of these pathways to build the gene-compound network. Is this what you are trying to do? If yes, just do:

metabo_paths_sco = paths_sco[paths_sco[, "Path_type"] == "metabolic", 1] # Get metabo pathway IDs

network = MS_keggNetwork(metabo_paths = metabo_paths_sco, expand_genes = TRUE) # If you want the genes to be clustered into orthologs use expand_genes = FALSE

 

Andrea

 

 

 

 

 

ADD REPLYlink written 5 months ago by AR35130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 333 users visited in the last hour