KEGG: gene ids for nodes in a pathway

0

Entering edit mode

Tim Smith ★ 1.1k

@tim-smith-1532

Last seen 11.4 years ago

Hi, I wanted a list of genes for a particular pathway arranged nodewise. For example, if I select the Jak-stat pathway ("http://www.genome.jp/kegg/pathway/hsa/hsa04630.html"), how do I get the entrez ids of genes associated with the node 'STAT' ? Currently, I use the following code: x <- toTable(org.Hs.egPATH) and then select genes associated with a particular pathway (e.g. for Jak-stat: "04630") . But this gives the entire set of genes associated with the pathway. Is there a way to get the entrez ids of the genes associated with each of the nodes ('JAK', 'STAT', 'STAM','PIAS' etc.) in the pathway? thanks! [[alternative HTML version deleted]]

• 2.3k views

ADD COMMENT • link 16.8 years ago Tim Smith ★ 1.1k

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 9.5 years ago

United States

Hi Tim, I think that the mapping you are using below already maps the entrez gene IDs associated with a particular pathway. All you need to do is use mget() instead of toTable(). So for pathway "04630", you can just get the associated entrez gene IDs like this: library(org.Hs.eg.db) mget("04630", revmap(org.Hs.egPATH), ifnotfound=NA) Marc Tim Smith wrote: > Hi, > > I wanted a list of genes for a particular pathway arranged nodewise. For example, if I select the Jak-stat pathway ("http://www.genome.jp/kegg/pathway/hsa/hsa04630.html"), how do I get the entrez ids of genes associated with the node 'STAT' ? Currently, I use the following code: > > x <- toTable(org.Hs.egPATH) > > and then select genes associated with a particular pathway (e.g. for Jak-stat: "04630") . But this gives the entire set of genes associated with the pathway. Is there a way to get the entrez ids of the genes associated with each of the nodes ('JAK', 'STAT', 'STAM','PIAS' etc.) in the pathway? > > thanks! > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD COMMENT • link 16.8 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Hi Mark & Saroj, Thanks for the replies. As Saroj suggested, I could use grep to get to 'STAT1', 'STAT3',....etc. for the STAT pathway. However, I would like to automate the process for the pathway (and possibly several pathways). With grep, I would need to actually look at the pathway in KEGG, figure out the nodes (e.g. 'STAT', 'JAK', 'PI3K'...etc) and then perform a grep for each of these to get to the genes (e.g. 'STAT1', 'STAT3', ...etc. for the 'STAT' node) associated with each of these nodes. What I was looking for was something I could use so that I could automate the process. I guess I could still use grep if there was some way of getting to all the node labels ('STAT') in a particular pathway. Is there such functionality? thanks again! ________________________________ From: Marc Carlson <mcarlson@fhcrc.org> Cc: bioc <bioconductor@stat.math.ethz.ch> Sent: Monday, May 4, 2009 6:05:52 PM Subject: Re: [BioC] KEGG: gene ids for nodes in a pathway Hi Tim, I think that the mapping you are using below already maps the entrez gene IDs associated with a particular pathway. All you need to do is use mget() instead of toTable(). So for pathway "04630", you can just get the associated entrez gene IDs like this: library(org.Hs.eg.db) mget("04630", revmap(org.Hs.egPATH), ifnotfound=NA) Marc Tim Smith wrote: > Hi, > > I wanted a list of genes for a particular pathway arranged nodewise. For example, if I select the Jak-stat pathway ("http://www.genome.jp/kegg/pathway/hsa/hsa04630.html"), how do I get the entrez ids of genes associated with the node 'STAT' ? Currently, I use the following code: > > x <- toTable(org.Hs.egPATH) > > and then select genes associated with a particular pathway (e.g. for Jak-stat: "04630") . But this gives the entire set of genes associated with the pathway. Is there a way to get the entrez ids of the genes associated with each of the nodes ('JAK', 'STAT', 'STAM','PIAS' etc.) in the pathway? > > thanks! > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]

ADD REPLY • link 16.8 years ago Tim Smith ★ 1.1k

0

Entering edit mode

Hi Tim, If you are willing to make a strong assumption about gene symbols, then you can group things using tapply(). a <- unlist(mget(mget("04630", revmap(org.Hs.egPATH), ifnotfound=NA)[[1]], org.Hs.egSYMBOL)) b <- sub("[0-9]+$", "", a) tapply(1:length(a), b, function(x) a[x]) This assumes that any numbers at the end of a gene symbol can be stripped off to get the 'base' gene type (e.g. IL2, IL3, IL4, IL21 are all Interleukins), as well as assuming that all gene symbols are consistent. You could also assume that you can strip off the last letter or two to get the 'base' gene symbol, which might get you a bit closer to what you want. Again, strong assumptions apply. Best, Jim Tim Smith wrote: > Hi Mark & Saroj, > > Thanks for the replies. > > As Saroj suggested, I could use grep to get to 'STAT1', > 'STAT3',....etc. for the STAT pathway. However, I would like to > automate the process for the pathway (and possibly several pathways). > With grep, I would need to actually look at the pathway in KEGG, > figure out the nodes (e.g. 'STAT', 'JAK', 'PI3K'...etc) and then > perform a grep for each of these to get to the genes (e.g. 'STAT1', > 'STAT3', ...etc. for the 'STAT' node) associated with each of these > nodes. What I was looking for was something I could use so that I > could automate the process. I guess I could still use grep if there > was some way of getting to all the node labels ('STAT') in a > particular pathway. Is there such functionality? > > thanks again! > > > > > > > ________________________________ From: Marc Carlson > <mcarlson at="" fhcrc.org=""> > > Cc: bioc <bioconductor at="" stat.math.ethz.ch=""> Sent: Monday, May 4, 2009 > 6:05:52 PM Subject: Re: [BioC] KEGG: gene ids for nodes in a pathway > > Hi Tim, > > I think that the mapping you are using below already maps the entrez > gene IDs associated with a particular pathway. All you need to do is > use mget() instead of toTable(). > > So for pathway "04630", you can just get the associated entrez gene > IDs like this: > > library(org.Hs.eg.db) mget("04630", revmap(org.Hs.egPATH), > ifnotfound=NA) > > > Marc > > > > > > Tim Smith wrote: >> Hi, >> >> I wanted a list of genes for a particular pathway arranged >> nodewise. For example, if I select the Jak-stat pathway >> ("http://www.genome.jp/kegg/pathway/hsa/hsa04630.html"), how do I >> get the entrez ids of genes associated with the node 'STAT' ? >> Currently, I use the following code: >> >> x <- toTable(org.Hs.egPATH) >> >> and then select genes associated with a particular pathway (e.g. >> for Jak-stat: "04630") . But this gives the entire set of genes >> associated with the pathway. Is there a way to get the entrez ids >> of the genes associated with each of the nodes ('JAK', 'STAT', >> 'STAM','PIAS' etc.) in the pathway? >> >> thanks! >> >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ Bioconductor >> mailing list Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > > > [[alternative HTML version deleted]] > > _______________________________________________ Bioconductor mailing > list Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826

ADD REPLY • link 16.8 years ago James W. MacDonald 68k

0

Entering edit mode

Hi Tim, Using KEGGgraph package may solve the problem. As an example: library(KEGGgraph) // use human MAPK pathway as an example xfile <- system.file("/extdata/hsa04010.xml", package="KEGGgraph") p <- parseKGML(xfile) pNodes <- nodes(p) displayNames <- sapply(pNodes, getDisplayName) geneids <- sapply(pNodes, function(x) translateKEGG2GeneID(getName(x))) The displayNames now contain the labels (the visible names of the nodes), while the geneids are the EntrezGeneID (in human case) of the genes contained in that node. To install KEGGgraph, just type source("http://www.bioconductor.org/biocLite.R") biocLite(KEGGgraph) Best wishes, David 2009/5/5 Tim Smith <tim_smith_666@yahoo.com> > Hi Mark & Saroj, > > Thanks for the replies. > > As Saroj suggested, I could use grep to get to 'STAT1', 'STAT3',....etc. > for the STAT pathway. However, I would like to automate the process for the > pathway (and possibly several pathways). With grep, I would need to actually > look at the pathway in KEGG, figure out the nodes (e.g. 'STAT', 'JAK', > 'PI3K'...etc) and then perform a grep for each of these to get to the genes > (e.g. 'STAT1', 'STAT3', ...etc. for the 'STAT' node) associated with each of > these nodes. What I was looking for was something I could use so that I > could automate the process. I guess I could still use grep if there was some > way of getting to all the node labels ('STAT') in a particular pathway. Is > there such functionality? > > thanks again! > > > > > > > ________________________________ > From: Marc Carlson <mcarlson@fhcrc.org> > > Cc: bioc <bioconductor@stat.math.ethz.ch> > Sent: Monday, May 4, 2009 6:05:52 PM > Subject: Re: [BioC] KEGG: gene ids for nodes in a pathway > > Hi Tim, > > I think that the mapping you are using below already maps the entrez > gene IDs associated with a particular pathway. All you need to do is > use mget() instead of toTable(). > > So for pathway "04630", you can just get the associated entrez gene IDs > like this: > > library(org.Hs.eg.db) > mget("04630", revmap(org.Hs.egPATH), ifnotfound=NA) > > > Marc > > > > > > Tim Smith wrote: > > Hi, > > > > I wanted a list of genes for a particular pathway arranged nodewise. For > example, if I select the Jak-stat pathway (" > http://www.genome.jp/kegg/pathway/hsa/hsa04630.html"), how do I get the > entrez ids of genes associated with the node 'STAT' ? Currently, I use the > following code: > > > > x <- toTable(org.Hs.egPATH) > > > > and then select genes associated with a particular pathway (e.g. for > Jak-stat: "04630") . But this gives the entire set of genes associated with > the pathway. Is there a way to get the entrez ids of the genes associated > with each of the nodes ('JAK', 'STAT', 'STAM','PIAS' etc.) in the pathway? > > > > thanks! > > > > > > > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Jitao David Zhang Computational Biology Ph.D. Division of Molecular Genome Analysis DKFZ, Heidelberg D-69120, Germany http://sites.google.com/site/jazzydevzoo/ [[alternative HTML version deleted]]

ADD REPLY • link 16.8 years ago Jitao David Zhang ▴ 340

0

Entering edit mode

Saroj K Mohapatra ▴ 400

@saroj-k-mohapatra-3419

Last seen 11.4 years ago

Hello Tim: To get all the entrez gene ids in a pathway ("04630")" in a variable called egs: > egs = get("04630", revmap(org.Hs.egPATH)) Then get the gene symbols for each gene ids: > syms = unlist(mget(egs, org.Hs.egSYMBOL)) Which of these gene symbols contain STAT? > syms[grep("STAT", syms)] 6772 6773 6774 6775 6776 6777 6778 "STAT1" "STAT2" "STAT3" "STAT4" "STAT5A" "STAT5B" "STAT6" Best wishes, Saroj Tim Smith wrote: > Hi, > > I wanted a list of genes for a particular pathway arranged nodewise. For example, if I select the Jak-stat pathway ("http://www.genome.jp/kegg/pathway/hsa/hsa04630.html"), how do I get the entrez ids of genes associated with the node 'STAT' ? Currently, I use the following code: > > x <- toTable(org.Hs.egPATH) > > and then select genes associated with a particular pathway (e.g. for Jak-stat: "04630") . But this gives the entire set of genes associated with the pathway. Is there a way to get the entrez ids of the genes associated with each of the nodes ('JAK', 'STAT', 'STAM','PIAS' etc.) in the pathway? > > thanks! > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD COMMENT • link 16.8 years ago Saroj K Mohapatra ▴ 400

0

Entering edit mode

Tim Smith ★ 1.1k

@tim-smith-1532

Last seen 11.4 years ago

Hi David, Thanks for the suggestion. That sounds exactly like what I want. I tried to install KEGGgraph, but got some errors: > source("http://www.bioconductor.org/biocLite.R") Warning messages: 1: In safeSource() : Redefining âbiocinstallâ 2: In safeSource() : Redefining âbiocinstallPkgGroupsâ 3: In safeSource() : Redefining âbiocinstallReposâ > biocLite(KEGGgraph) Running biocinstall version 2.3.13 with R version 2.8.1 Your version of R requires version 2.3 of Bioconductor. Error in install.packages(pkgs = pkgs, repos = repos, dependencies = dependencies, : object "KEGGgraph" not found Am I doing something wrong? thanks! PS: My session info is: > sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines tools stats graphics grDevices utils datasets methods base other attached packages: [1] RankAggreg_0.3-1 gplots_2.6.0 gmodels_2.14.1 gtools_2.5.0-1 gdata_2.4.2 Rgraphviz_1.14.1 [7] EBImage_2.6.0 KEGGSOAP_1.16.0 SSOAP_0.4-8 RCurl_0.94-0 geneplotter_1.20.0 lattice_0.17-20 [13] XML_1.99-0 biomaRt_1.16.0 GOstats_2.8.0 Category_2.8.4 RBGL_1.18.0 annotate_1.20.1 [19] xtable_1.5-4 graph_1.20.0 PFAM.db_2.2.5 GO.db_2.2.5 KEGG.db_2.2.5 org.Mm.eg.db_2.2.6 [25] org.Hs.eg.db_2.2.6 RSQLite_0.7-1 DBI_0.2-4 AnnotationDbi_1.4.3 genefilter_1.22.0 survival_2.34-1 [31] affy_1.20.2 Biobase_2.2.2 loaded via a namespace (and not attached): [1] affyio_1.10.1 cluster_1.11.12 grid_2.8.1 GSEABase_1.4.0 KernSmooth_2.22-22 MASS_7.2-45 [7] preprocessCore_1.4.0 RColorBrewer_1.0-2 > ________________________________ From: Jitao David Zhang <davidvonpku@gmail.com> Sent: Tuesday, May 5, 2009 1:21:04 PM Subject: Re: [BioC] KEGG: gene ids for nodes in a pathway Hi Tim, Using KEGGgraph package may solve the problem. As an example: library(KEGGgraph) // use human MAPK pathway as an example xfile <- system.file("/extdata/hsa04010.xml", package="KEGGgraph") p <- parseKGML(xfile) pNodes <- nodes(p) displayNames <- sapply(pNodes, getDisplayName) geneids <- sapply(pNodes, function(x) translateKEGG2GeneID(getName(x))) The displayNames now contain the labels (the visible names of the nodes), while the geneids are the EntrezGeneID (in human case) of the genes contained in that node. To install KEGGgraph, just type source("http://www.bioconductor.org/biocLite.R") biocLite(KEGGgraph) Best wishes, David Hi, I wanted a list of genes for a particular pathway arranged nodewise. For example, if I select the Jak-stat pathway ("http://www.genome.jp/kegg/pathway/hsa/hsa04630.html"), how do I get the entrez ids of genes associated with the node 'STAT' ? Currently, I use the following code: x <- toTable(org.Hs.egPATH) and then select genes associated with a particular pathway (e.g. for Jak-stat: "04630") . But this gives the entire set of genes associated with the pathway. Is there a way to get the entrez ids of the genes associated with each of the nodes ('JAK', 'STAT', 'STAM','PIAS' etc.) in the pathway? thanks! [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz..ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Jitao David Zhang Computational Biology Ph.D. Division of Molecular Genome Analysis DKFZ, Heidelberg D-69120, Germany http://sites.google.com/site/jazzydevzoo/ [[alternative HTML version deleted]]

ADD COMMENT • link 16.8 years ago Tim Smith ★ 1.1k

0

Entering edit mode

KEGGgraph is part of R2.9.0 not 2.8.1. You will need to upgrade to the latest version of R. Fraser -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch [mailto:bioconductor- bounces@stat.math.ethz.ch] On Behalf Of Tim Smith Sent: Tuesday, May 05, 2009 2:03 PM To: bioc Subject: Re: [BioC] KEGG: gene ids for nodes in a pathway Hi David, Thanks for the suggestion. That sounds exactly like what I want. I tried to install KEGGgraph, but got some errors: > source("http://www.bioconductor.org/biocLite.R") Warning messages: 1: In safeSource() : Redefining ???biocinstall??? 2: In safeSource() : Redefining ???biocinstallPkgGroups??? 3: In safeSource() : Redefining ???biocinstallRepos??? > biocLite(KEGGgraph) Running biocinstall version 2.3.13 with R version 2.8.1 Your version of R requires version 2.3 of Bioconductor. Error in install.packages(pkgs = pkgs, repos = repos, dependencies = dependencies, : object "KEGGgraph" not found Am I doing something wrong? thanks! PS: My session info is: > sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] splines tools stats graphics grDevices utils datasets methods base other attached packages: [1] RankAggreg_0.3-1 gplots_2.6.0 gmodels_2.14.1 gtools_2.5.0-1 gdata_2.4.2 Rgraphviz_1.14.1 [7] EBImage_2.6.0 KEGGSOAP_1.16.0 SSOAP_0.4-8 RCurl_0.94-0 geneplotter_1.20.0 lattice_0.17-20 [13] XML_1.99-0 biomaRt_1.16.0 GOstats_2.8.0 Category_2.8.4 RBGL_1.18.0 annotate_1.20.1 [19] xtable_1.5-4 graph_1.20.0 PFAM.db_2.2.5 GO.db_2.2.5 KEGG.db_2.2.5 org.Mm.eg.db_2.2.6 [25] org.Hs.eg.db_2.2.6 RSQLite_0.7-1 DBI_0.2-4 AnnotationDbi_1.4.3 genefilter_1.22.0 survival_2.34-1 [31] affy_1.20.2 Biobase_2.2.2 loaded via a namespace (and not attached): [1] affyio_1.10.1 cluster_1.11.12 grid_2.8.1 GSEABase_1.4.0 KernSmooth_2.22-22 MASS_7.2-45 [7] preprocessCore_1.4.0 RColorBrewer_1.0-2 > ________________________________ From: Jitao David Zhang <davidvonpku@gmail.com> Sent: Tuesday, May 5, 2009 1:21:04 PM Subject: Re: [BioC] KEGG: gene ids for nodes in a pathway Hi Tim, Using KEGGgraph package may solve the problem. As an example: library(KEGGgraph) // use human MAPK pathway as an example xfile <- system.file("/extdata/hsa04010.xml", package="KEGGgraph") p <- parseKGML(xfile) pNodes <- nodes(p) displayNames <- sapply(pNodes, getDisplayName) geneids <- sapply(pNodes, function(x) translateKEGG2GeneID(getName(x))) The displayNames now contain the labels (the visible names of the nodes), while the geneids are the EntrezGeneID (in human case) of the genes contained in that node. To install KEGGgraph, just type source("http://www.bioconductor.org/biocLite.R") biocLite(KEGGgraph) Best wishes, David Hi, I wanted a list of genes for a particular pathway arranged nodewise. For example, if I select the Jak-stat pathway ("http://www.genome.jp/kegg/pathway/hsa/hsa04630.html"), how do I get the entrez ids of genes associated with the node 'STAT' ? Currently, I use the following code: x <- toTable(org.Hs.egPATH) and then select genes associated with a particular pathway (e.g. for Jak-stat: "04630") . But this gives the entire set of genes associated with the pathway. Is there a way to get the entrez ids of the genes associated with each of the nodes ('JAK', 'STAT', 'STAM','PIAS' etc.) in the pathway? thanks! [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz..ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Jitao David Zhang Computational Biology Ph.D. Division of Molecular Genome Analysis DKFZ, Heidelberg D-69120, Germany http://sites.google.com/site/jazzydevzoo/ [[alternative HTML version deleted]]

ADD REPLY • link 16.8 years ago Sim, Fraser ▴ 350

Login before adding your answer.