KEGG data missing from KEGG.db
3
0
Entering edit mode
Chris Fjell ▴ 60
@chris-fjell-3696
Last seen 9.6 years ago
I'm using KEGG.db and org.Hs.egPATH to find the genes in KEGG pathways and the reverse. It seems to me some pathways are not found in the annotation packages, for example hsa04621. Can someone correct me? Anyone know where to get the equivalent in, say, a tab-delimited file instead? For example, when I look up the gene 10392 on the KEGG web site I get pathways (http://www.genome.jp/dbget-bin/www_bget?hsa:10392) hsa04621 NOD-like receptor signaling pathway hsa05120 Epithelial cell signaling in Helicobacter pylori infection hsa05131 Shigellosis But using KEGGEXTID2PATHID I get no entry for that pathway > mget( "hsa04621" , revmap(KEGGEXTID2PATHID) ) Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : value for "hsa04621" not found And the gene just gives me one pathway: > mget( "10392" , org.Hs.egPATH ) $`10392` [1] "05120" I think I'm doing this correctly as for "hsa04620": > mget( "hsa04620" , revmap(KEGGEXTID2PATHID) ) $hsa04620 [1] "10000" "10333" "10454" "114609" "1147" "1326" "1432" "148022" [9] "207" "208" "23118" "2353" "23533" "23643" "29110" "3439" [17] "3440" "3441" "3442" "3443" "3444" "3445" "3446" "3447" [25] "3448" "3449" "3451" "3452" "3454" "3455" "3456" "353376" [33] "3551" "3553" "3569" "3576" "3592" "3593" "3627" "3654" [41] "3661" "3663" "3665" "3725" "3929" "4283" "4615" "4790" [49] "4792" "51135" "51284" "51311" "5290" "5291" "5293" "5294" [57] "5295" "5296" "54106" "54472" "5594" "5595" "5599" "5600" [65] "5601" "5602" "5603" "5604" "5605" "5606" "5608" "5609" [73] "5879" "5970" "6300" "6348" "6351" "6352" "6373" "6416" [81] "6696" "6772" "6885" "7096" "7097" "7098" "7099" "7100" [89] "7124" "7187" "7189" "841" "8503" "8517" "8737" "8772" [97] "929" "941" "942" "958" "9641" > mget( "10000" , org.Hs.egPATH ) $`10000` [1] "04010" "04012" "04062" "04150" "04210" "04370" "04510" "04530" "04620" [10] "04630" "04660" "04662" "04664" "04666" "04722" "04910" "04920" "05200" [19] "05210" "05211" "05212" "05213" "05214" "05215" "05218" "05220" "05221" [28] "05222" "05223" Cheers, -Chris
Annotation Pathways Annotation Pathways • 1.2k views
ADD COMMENT
0
Entering edit mode
Chao-Jen Wong ▴ 580
@chao-jen-wong-3603
Last seen 9.3 years ago
USA/Seattle/Fred Hutchinson Cancer Reseā€¦
Which version of KEGG.db are you using? If you can upgrade your KEGG.db to the current version 2.4.1, then you should be able to find the information. > mget( "hsa04621" , revmap(KEGGEXTID2PATHID) ) $hsa04621 [1] "10392" "10454" "10910" "114548" "1147" "1432" "22861" "22900" [9] "23118" "257397" "260434" "29108" "2919" "2920" "329" "330" [17] "331" "3320" "3326" "3551" "3553" "3569" "3576" "3606" [25] "4210" "4671" "4790" "4792" "4793" "55914" "5594" "5595" [33] "5599" "5600" "5601" "5602" "5603" "58484" "59082" "5970" [41] "6300" "6347" "6352" "6354" "6355" "6356" "6357" "64127" [49] "64170" "6885" "7124" "7128" "7184" "7189" "7205" "834" [57] "838" "841" "84674" "8517" "8767" "9051" > sessionInfo() R version 2.12.0 Under development (unstable) (2010-05-17 r52025) x86_64-unknown-linux-gnu locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] KEGG.db_2.4.1 RSQLite_0.9-1 DBI_0.2-5 [4] AnnotationDbi_1.11.3 Biobase_2.9.0 loaded via a namespace (and not attached): [1] tools_2.12.0 On 07/16/10 15:44, Chris Fjell wrote: > I'm using KEGG.db and org.Hs.egPATH to find the genes in KEGG pathways > and the reverse. > It seems to me some pathways are not found in the annotation packages, > for example hsa04621. > > Can someone correct me? Anyone know where to get the equivalent in, say, > a tab-delimited file instead? > > For example, when I look up the gene 10392 on the KEGG web site I get > pathways > (http://www.genome.jp/dbget-bin/www_bget?hsa:10392) > hsa04621 NOD-like receptor signaling pathway > hsa05120 Epithelial cell signaling in Helicobacter pylori infection > hsa05131 Shigellosis > > But using KEGGEXTID2PATHID I get no entry for that pathway > >> mget( "hsa04621" , revmap(KEGGEXTID2PATHID) ) >> > Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : > value for "hsa04621" not found > > And the gene just gives me one pathway: > >> mget( "10392" , org.Hs.egPATH ) >> > $`10392` > [1] "05120" > > I think I'm doing this correctly as for "hsa04620": > > >> mget( "hsa04620" , revmap(KEGGEXTID2PATHID) ) >> > $hsa04620 > [1] "10000" "10333" "10454" "114609" "1147" "1326" "1432" > "148022" > [9] "207" "208" "23118" "2353" "23533" "23643" "29110" > "3439" > [17] "3440" "3441" "3442" "3443" "3444" "3445" "3446" > "3447" > [25] "3448" "3449" "3451" "3452" "3454" "3455" "3456" > "353376" > [33] "3551" "3553" "3569" "3576" "3592" "3593" "3627" > "3654" > [41] "3661" "3663" "3665" "3725" "3929" "4283" "4615" > "4790" > [49] "4792" "51135" "51284" "51311" "5290" "5291" "5293" > "5294" > [57] "5295" "5296" "54106" "54472" "5594" "5595" "5599" > "5600" > [65] "5601" "5602" "5603" "5604" "5605" "5606" "5608" > "5609" > [73] "5879" "5970" "6300" "6348" "6351" "6352" "6373" > "6416" > [81] "6696" "6772" "6885" "7096" "7097" "7098" "7099" > "7100" > [89] "7124" "7187" "7189" "841" "8503" "8517" "8737" > "8772" > [97] "929" "941" "942" "958" "9641" > > >> mget( "10000" , org.Hs.egPATH ) >> > $`10000` > [1] "04010" "04012" "04062" "04150" "04210" "04370" "04510" "04530" "04620" > [10] "04630" "04660" "04662" "04664" "04666" "04722" "04910" "04920" "05200" > [19] "05210" "05211" "05212" "05213" "05214" "05215" "05218" "05220" "05221" > [28] "05222" "05223" > > Cheers, > -Chris > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Chao-Jen Wong Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., M1-B514 PO Box 19024 Seattle, WA 98109 206.667.4485 cwon2 at fhcrc.org
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 7.7 years ago
United States
Hi Chris, I suspect that there are a couple of things going on here. The 1st is that I think your version of KEGG.db is out of date. I get many genes for the pathway you checked on the latest version. But I can't be 100% sure because you did not share your sessionInfo() with us. :( The other issue you are likely facing is that the KEGG website is updated continuously, while the annotation packages are updated (and versioned) biannually. The web site will give you the very most current information, whereas the package will allow you to reproduce your results several months from now when you finally get that paper written up or even later when someone demands that you demonstrate how you got that result two years ago etc. Both of these approaches are valuable in their own way. But the difference between them means that a few annotations may not be in the latest annotation packages till they are updated again in the fall. Anyhow I hope that this helps you. Marc Quoting Chris Fjell <cfjell at="" interchange.ubc.ca="">: > I'm using KEGG.db and org.Hs.egPATH to find the genes in KEGG pathways > and the reverse. > It seems to me some pathways are not found in the annotation packages, > for example hsa04621. > > Can someone correct me? Anyone know where to get the equivalent in, say, > a tab-delimited file instead? > > For example, when I look up the gene 10392 on the KEGG web site I get > pathways > (http://www.genome.jp/dbget-bin/www_bget?hsa:10392) > hsa04621 NOD-like receptor signaling pathway > hsa05120 Epithelial cell signaling in Helicobacter pylori infection > hsa05131 Shigellosis > > But using KEGGEXTID2PATHID I get no entry for that pathway >> mget( "hsa04621" , revmap(KEGGEXTID2PATHID) ) > Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : > value for "hsa04621" not found > > And the gene just gives me one pathway: >> mget( "10392" , org.Hs.egPATH ) > $`10392` > [1] "05120" > > I think I'm doing this correctly as for "hsa04620": > >> mget( "hsa04620" , revmap(KEGGEXTID2PATHID) ) > $hsa04620 > [1] "10000" "10333" "10454" "114609" "1147" "1326" "1432" > "148022" > [9] "207" "208" "23118" "2353" "23533" "23643" "29110" > "3439" > [17] "3440" "3441" "3442" "3443" "3444" "3445" "3446" > "3447" > [25] "3448" "3449" "3451" "3452" "3454" "3455" "3456" > "353376" > [33] "3551" "3553" "3569" "3576" "3592" "3593" "3627" > "3654" > [41] "3661" "3663" "3665" "3725" "3929" "4283" "4615" > "4790" > [49] "4792" "51135" "51284" "51311" "5290" "5291" "5293" > "5294" > [57] "5295" "5296" "54106" "54472" "5594" "5595" "5599" > "5600" > [65] "5601" "5602" "5603" "5604" "5605" "5606" "5608" > "5609" > [73] "5879" "5970" "6300" "6348" "6351" "6352" "6373" > "6416" > [81] "6696" "6772" "6885" "7096" "7097" "7098" "7099" > "7100" > [89] "7124" "7187" "7189" "841" "8503" "8517" "8737" > "8772" > [97] "929" "941" "942" "958" "9641" > >> mget( "10000" , org.Hs.egPATH ) > $`10000` > [1] "04010" "04012" "04062" "04150" "04210" "04370" "04510" "04530" "04620" > [10] "04630" "04660" "04662" "04664" "04666" "04722" "04910" "04920" "05200" > [19] "05210" "05211" "05212" "05213" "05214" "05215" "05218" "05220" "05221" > [28] "05222" "05223" > > Cheers, > -Chris > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
Chris Fjell ▴ 60
@chris-fjell-3696
Last seen 9.6 years ago
Thanks for the quick replies! You're right, neglected my sessionInfo (oops - sorry!). From that I see I'm using KEGG.db_2.3.5 . I'm on R 2.10 and BioC 2.6 needs R 2.11. I'll need to upgrade all. I reinstalled KEGG.db repeatedly, thinking that would get me the latest and greatest, but of course I need to update R for that! I'm surprised KEGG.db 3.5 didn't have NOD-like receptor signaling pathway hsa04621... Cheers, -Chris > sessionInfo() R version 2.10.1 (2009-12-14) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_CA.UTF-8 [7] LC_PAPER=en_CA.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid tools stats graphics grDevices utils datasets [8] methods base other attached packages: [1] RBGL_1.24.0 KEGGgraph_1.2.2 Rgraphviz_1.24.0 [4] KEGG.db_2.3.5 pls_2.1-0 preprocessCore_1.8.0 [7] limma_3.2.3 SPIA_1.4.0 hopach_2.6.0 [10] cluster_1.13.1 org.Rn.eg.db_2.3.5 org.Mm.eg.db_2.3.6 [13] org.Hs.eg.db_2.3.6 genefilter_1.28.2 annotate_1.24.1 [16] GOstats_2.12.0 graph_1.24.4 Category_2.12.1 [19] GO.db_2.3.5 RSQLite_0.8-4 DBI_0.2-5 [22] AnnotationDbi_1.8.2 CORNA_1.2 XML_2.8-1 [25] GEOquery_2.11.3 RCurl_1.4-1 bitops_1.0-4.1 [28] biomaRt_2.2.0 Biobase_2.6.1 loaded via a namespace (and not attached): [1] GSEABase_1.8.0 splines_2.10.1 survival_2.35-8 xtable_1.5-6 mcarlson at fhcrc.org wrote: > Hi Chris, > > I suspect that there are a couple of things going on here. The 1st is > that I think your version of KEGG.db is out of date. I get many genes > for the pathway you checked on the latest version. But I can't be > 100% sure because you did not share your sessionInfo() with us. :( > > The other issue you are likely facing is that the KEGG website is > updated continuously, while the annotation packages are updated (and > versioned) biannually. The web site will give you the very most > current information, whereas the package will allow you to reproduce > your results several months from now when you finally get that paper > written up or even later when someone demands that you demonstrate how > you got that result two years ago etc. Both of these approaches are > valuable in their own way. But the difference between them means that > a few annotations may not be in the latest annotation packages till > they are updated again in the fall. > > Anyhow I hope that this helps you. > > > Marc > > > > Quoting Chris Fjell <cfjell at="" interchange.ubc.ca="">: > >> I'm using KEGG.db and org.Hs.egPATH to find the genes in KEGG pathways >> and the reverse. >> It seems to me some pathways are not found in the annotation packages, >> for example hsa04621. >> >> Can someone correct me? Anyone know where to get the equivalent in, say, >> a tab-delimited file instead? >> >> For example, when I look up the gene 10392 on the KEGG web site I get >> pathways >> (http://www.genome.jp/dbget-bin/www_bget?hsa:10392) >> hsa04621 NOD-like receptor signaling pathway >> hsa05120 Epithelial cell signaling in Helicobacter pylori infection >> hsa05131 Shigellosis >> >> But using KEGGEXTID2PATHID I get no entry for that pathway >>> mget( "hsa04621" , revmap(KEGGEXTID2PATHID) ) >> Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : >> value for "hsa04621" not found >> >> And the gene just gives me one pathway: >>> mget( "10392" , org.Hs.egPATH ) >> $`10392` >> [1] "05120" >> >> I think I'm doing this correctly as for "hsa04620": >> >>> mget( "hsa04620" , revmap(KEGGEXTID2PATHID) ) >> $hsa04620 >> [1] "10000" "10333" "10454" "114609" "1147" "1326" "1432" >> "148022" >> [9] "207" "208" "23118" "2353" "23533" "23643" "29110" >> "3439" >> [17] "3440" "3441" "3442" "3443" "3444" "3445" "3446" >> "3447" >> [25] "3448" "3449" "3451" "3452" "3454" "3455" "3456" >> "353376" >> [33] "3551" "3553" "3569" "3576" "3592" "3593" "3627" >> "3654" >> [41] "3661" "3663" "3665" "3725" "3929" "4283" "4615" >> "4790" >> [49] "4792" "51135" "51284" "51311" "5290" "5291" "5293" >> "5294" >> [57] "5295" "5296" "54106" "54472" "5594" "5595" "5599" >> "5600" >> [65] "5601" "5602" "5603" "5604" "5605" "5606" "5608" >> "5609" >> [73] "5879" "5970" "6300" "6348" "6351" "6352" "6373" >> "6416" >> [81] "6696" "6772" "6885" "7096" "7097" "7098" "7099" >> "7100" >> [89] "7124" "7187" "7189" "841" "8503" "8517" "8737" >> "8772" >> [97] "929" "941" "942" "958" "9641" >> >>> mget( "10000" , org.Hs.egPATH ) >> $`10000` >> [1] "04010" "04012" "04062" "04150" "04210" "04370" "04510" "04530" >> "04620" >> [10] "04630" "04660" "04662" "04664" "04666" "04722" "04910" "04920" >> "05200" >> [19] "05210" "05211" "05212" "05213" "05214" "05215" "05218" "05220" >> "05221" >> [28] "05222" "05223" >> >> Cheers, >> -Chris >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- ------------------------------------------------- Christopher Fjell, PhD Postdoctoral Fellow Centre for Microbial Diseases and Immunity Research Department of Microbiology and Immunology University of British Columbia 2259 Lower Mall Vancouver, British Columbia, Canada, V6T 1Z4 cfjell at interchange.ubc.ca / chris at cmdr.ubc.ca Fax: (604) 827-5566
ADD COMMENT

Login before adding your answer.

Traffic: 901 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6