GO's to gene's
2
0
Entering edit mode
Loren Engrav ★ 1.0k
@loren-engrav-2040
Last seen 10.2 years ago
Is there a BioC package that will find all the GO terms containing some word, like perhaps ³collagen² And then find all the genes contained within those found terms I scanned GoProfiles GOSemSim GOstats GoTools and TopGO And could not determine that any would do that. Thank you. [[alternative HTML version deleted]]
GO GO • 1.8k views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 10 weeks ago
United States
Perhaps there is a package with such functionality. However, with the GO.db package in place, you need to do a little programming, perhaps along the lines of querGO = function(str, attr = "definition", ont = "MF") { require(GO.db, quietly = TRUE) gc = GO_dbconn() quer.1 = paste("select go_id, term from go_term where", attr, "like('%") quer.2 = "%') and ontology = '" quer.3 = "'" quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "", sep = "") dbGetQuery(gc, quer) } whereby > querGO("collagen", "term") go_id term 1 GO:0004656 procollagen-proline 4-dioxygenase activity 2 GO:0005518 collagen binding 3 GO:0008475 procollagen-lysine 5-dioxygenase activity 4 GO:0019797 procollagen-proline 3-dioxygenase activity 5 GO:0019798 procollagen-proline dioxygenase activity 6 GO:0033823 procollagen glucosyltransferase activity 7 GO:0042329 structural constituent of collagen and cuticulin-based cuticle 8 GO:0050211 procollagen galactosyltransferase activity 9 GO:0070052 collagen V binding > On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav <engrav at="" u.washington.edu=""> wrote: > Is there a BioC package that will find all the GO terms containing some > word, like perhaps ?collagen? > And then find all the genes contained within those found terms > > I scanned > GoProfiles > GOSemSim > GOstats > GoTools and > TopGO > > And could not determine that any would do that. > > Thank you. > > > > > ? ? ? ?[[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT
0
Entering edit mode
On 02/28/2010 06:14 PM, Vincent Carey wrote: > Perhaps there is a package with such functionality. However, with the > GO.db package in place, you need to do a little > programming, perhaps along the lines of > > querGO = function(str, attr = "definition", ont = "MF") { > require(GO.db, quietly = TRUE) > gc = GO_dbconn() > quer.1 = paste("select go_id, term from go_term where", > attr, "like('%") > quer.2 = "%') and ontology = '" > quer.3 = "'" > quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "", > sep = "") > dbGetQuery(gc, quer) > } > > whereby > >> querGO("collagen", "term") > go_id term > 1 GO:0004656 procollagen-proline 4-dioxygenase activity > 2 GO:0005518 collagen binding > 3 GO:0008475 procollagen-lysine 5-dioxygenase activity > 4 GO:0019797 procollagen-proline 3-dioxygenase activity > 5 GO:0019798 procollagen-proline dioxygenase activity > 6 GO:0033823 procollagen glucosyltransferase activity > 7 GO:0042329 structural constituent of collagen and cuticulin-based cuticle > 8 GO:0050211 procollagen galactosyltransferase activity > 9 GO:0070052 collagen V binding >> Also library(GO.db) terms <- Term(GOTERM) # or maybe Definition(GOTERM) ? ontologies <- Ontology(GOTERM) collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)] and the next step, library(org.Hs.eg.db) egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) egids <- egids[!is.na(egids)] > > On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav <engrav at="" u.washington.edu=""> wrote: >> Is there a BioC package that will find all the GO terms containing some >> word, like perhaps ?collagen? >> And then find all the genes contained within those found terms >> >> I scanned >> GoProfiles >> GOSemSim >> GOstats >> GoTools and >> TopGO >> >> And could not determine that any would do that. >> >> Thank you. >> >> >> >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
Thank you both Given my skills, it might be easier/quicker to do it "manually" with Amigo But I am trying both methods For the second method I get > library(GO.db) Loading required package: AnnotationDbi Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation(pkgname)'. Loading required package: DBI > terms <- Term(GOTERM) Error in function (classes, fdef, mtable) : unable to find an inherited method for function "Term", for signature "GOTermsAnnDbBimap" > sessionInfo() R version 2.9.2 Patched (2009-09-05 r49613) i386-apple-darwin9.8.0 locale: en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base > From: Martin Morgan <mtmorgan at="" fhcrc.org=""> > Date: Sun, 28 Feb 2010 18:42:33 -0800 > To: Vincent Carey <stvjc at="" channing.harvard.edu=""> > Cc: Loren Engrav <engrav at="" u.washington.edu="">, "bioconductor at stat.math.ethz.ch" > <bioconductor at="" stat.math.ethz.ch=""> > Subject: Re: [BioC] GO's to gene's > > On 02/28/2010 06:14 PM, Vincent Carey wrote: >> Perhaps there is a package with such functionality. However, with the >> GO.db package in place, you need to do a little >> programming, perhaps along the lines of >> >> querGO = function(str, attr = "definition", ont = "MF") { >> require(GO.db, quietly = TRUE) >> gc = GO_dbconn() >> quer.1 = paste("select go_id, term from go_term where", >> attr, "like('%") >> quer.2 = "%') and ontology = '" >> quer.3 = "'" >> quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "", >> sep = "") >> dbGetQuery(gc, quer) >> } >> >> whereby >> >>> querGO("collagen", "term") >> go_id term >> 1 GO:0004656 procollagen-proline 4-dioxygenase activity >> 2 GO:0005518 collagen binding >> 3 GO:0008475 procollagen-lysine 5-dioxygenase activity >> 4 GO:0019797 procollagen-proline 3-dioxygenase activity >> 5 GO:0019798 procollagen-proline dioxygenase activity >> 6 GO:0033823 procollagen glucosyltransferase activity >> 7 GO:0042329 structural constituent of collagen and cuticulin-based cuticle >> 8 GO:0050211 procollagen galactosyltransferase activity >> 9 GO:0070052 collagen V binding >>> > > Also > > library(GO.db) > terms <- Term(GOTERM) # or maybe Definition(GOTERM) ? > ontologies <- Ontology(GOTERM) > collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)] > > and the next step, > > library(org.Hs.eg.db) > egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) > egids <- egids[!is.na(egids)] > > >> >> On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav <engrav at="" u.washington.edu=""> >> wrote: >>> Is there a BioC package that will find all the GO terms containing some >>> word, like perhaps ?collagen? >>> And then find all the genes contained within those found terms >>> >>> I scanned >>> GoProfiles >>> GOSemSim >>> GOstats >>> GoTools and >>> TopGO >>> >>> And could not determine that any would do that. >>> >>> Thank you. >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
On 02/28/2010 07:17 PM, Loren Engrav wrote: > Thank you both > Given my skills, it might be easier/quicker to do it "manually" with Amigo > But I am trying both methods > > For the second method I get > >> library(GO.db) > Loading required package: AnnotationDbi > Loading required package: Biobase > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'openVignette()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation(pkgname)'. > > Loading required package: DBI >> terms <- Term(GOTERM) > Error in function (classes, fdef, mtable) : > unable to find an inherited method for function "Term", for signature > "GOTermsAnnDbBimap" > >> sessionInfo() > R version 2.9.2 Patched (2009-09-05 r49613) > i386-apple-darwin9.8.0 > > locale: > en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 , > attached base packages: > [1] stats graphics grDevices utils datasets methods base Update to R version 2.10 and associated Bioc packages, or for a (much) slower solution (you'll want to check that Term and Ontology return ids in identical order) terms = eapply(GOTERM, Term) etc. I have > sessionInfo() R version 2.10.1 Patched (2010-02-23 r51168) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GO.db_2.3.5 RSQLite_0.7-3 DBI_0.2-4 [4] AnnotationDbi_1.8.1 Biobase_2.6.1 loaded via a namespace (and not attached): [1] tools_2.10.1 Martin > >> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >> Date: Sun, 28 Feb 2010 18:42:33 -0800 >> To: Vincent Carey <stvjc at="" channing.harvard.edu=""> >> Cc: Loren Engrav <engrav at="" u.washington.edu="">, "bioconductor at stat.math.ethz.ch" >> <bioconductor at="" stat.math.ethz.ch=""> >> Subject: Re: [BioC] GO's to gene's >> >> On 02/28/2010 06:14 PM, Vincent Carey wrote: >>> Perhaps there is a package with such functionality. However, with the >>> GO.db package in place, you need to do a little >>> programming, perhaps along the lines of >>> >>> querGO = function(str, attr = "definition", ont = "MF") { >>> require(GO.db, quietly = TRUE) >>> gc = GO_dbconn() >>> quer.1 = paste("select go_id, term from go_term where", >>> attr, "like('%") >>> quer.2 = "%') and ontology = '" >>> quer.3 = "'" >>> quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "", >>> sep = "") >>> dbGetQuery(gc, quer) >>> } >>> >>> whereby >>> >>>> querGO("collagen", "term") >>> go_id term >>> 1 GO:0004656 procollagen-proline 4-dioxygenase activity >>> 2 GO:0005518 collagen binding >>> 3 GO:0008475 procollagen-lysine 5-dioxygenase activity >>> 4 GO:0019797 procollagen-proline 3-dioxygenase activity >>> 5 GO:0019798 procollagen-proline dioxygenase activity >>> 6 GO:0033823 procollagen glucosyltransferase activity >>> 7 GO:0042329 structural constituent of collagen and cuticulin- based cuticle >>> 8 GO:0050211 procollagen galactosyltransferase activity >>> 9 GO:0070052 collagen V binding >>>> >> >> Also >> >> library(GO.db) >> terms <- Term(GOTERM) # or maybe Definition(GOTERM) ? >> ontologies <- Ontology(GOTERM) >> collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)] >> >> and the next step, >> >> library(org.Hs.eg.db) >> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) >> egids <- egids[!is.na(egids)] >> >> >>> >>> On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav <engrav at="" u.washington.edu=""> >>> wrote: >>>> Is there a BioC package that will find all the GO terms containing some >>>> word, like perhaps ?collagen? >>>> And then find all the genes contained within those found terms >>>> >>>> I scanned >>>> GoProfiles >>>> GOSemSim >>>> GOstats >>>> GoTools and >>>> TopGO >>>> >>>> And could not determine that any would do that. >>>> >>>> Thank you. >>>> >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> Martin Morgan >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
Ok thank you I now show > sessionInfo() R version 2.10.1 (2009-12-14) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] org.Hs.eg.db_2.3.6 GO.db_2.3.5 RSQLite_0.8-3 AnnotationDbi_1.8.1 DBI_0.2-5 [6] Biobase_2.6.1 loaded via a namespace (and not attached): [1] tools_2.10.1 And all commands pass with no errors, however I see > egids $`GO:0010711` IEP "1471" $`GO:0030199` IEA IEA ISS IEA IMP IMP IMP IMP NAS IMP NAS IMP ISS "302" "304" "538" "871" "1277" "1278" "1280" "1281" "1281" "1289" "1289" "1290" "1290" NAS IDA NAS IEA IEA IEA IEA IEA NAS ISS IDA ISS NAS "1301" "1302" "1303" "1805" "2296" "2303" "4010" "4015" "4060" "4763" "7042" "7046" "7373" NAS NAS "9508" "50509" $`GO:0030574` IEA IEA IEA IEA IEA IEA IEA IEA IEA IEA IEA "4312" "4313" "4314" "4316" "4317" "4318" "4319" "4320" "4322" "4325" "4327" IEA IDA IMP NAS IEA NAS IEA IEA IEA IEA "5184" "5645" "5645" "5653" "5657" "9508" "9509" "56547" "64066" "140766" $`GO:0032963` IEA IMP "3091" "7148" $`GO:0032964` IEA IMP IMP TAS IMP "871" "1277" "1281" "1281" "1289" $`GO:0032966` IDA IC "3569" "4261" $`GO:0032967` ISS IDA IDA IC IMP TAS IMP "265" "2147" "2149" "3066" "7040" "7040" "7043" $`GO:0033342` IMP "23560" So many GO terms containing the word "collagen" are not listed, like 0004656 0005518 etc Amigo claims there are 68 such terms and the list above has only 8 What did I do wrong? Also I would like to omit the IEA group Thank you > From: Martin Morgan <mtmorgan at="" fhcrc.org=""> > Date: Sun, 28 Feb 2010 19:30:34 -0800 > To: Loren Engrav <engrav at="" u.washington.edu=""> > Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > Subject: Re: [BioC] GO's to gene's > > On 02/28/2010 07:17 PM, Loren Engrav wrote: >> Thank you both >> Given my skills, it might be easier/quicker to do it "manually" with Amigo >> But I am trying both methods >> >> For the second method I get >> >>> library(GO.db) >> Loading required package: AnnotationDbi >> Loading required package: Biobase >> >> Welcome to Bioconductor >> >> Vignettes contain introductory material. To view, type >> 'openVignette()'. To cite Bioconductor, see >> 'citation("Biobase")' and for packages 'citation(pkgname)'. >> >> Loading required package: DBI >>> terms <- Term(GOTERM) >> Error in function (classes, fdef, mtable) : >> unable to find an inherited method for function "Term", for signature >> "GOTermsAnnDbBimap" >> >>> sessionInfo() >> R version 2.9.2 Patched (2009-09-05 r49613) >> i386-apple-darwin9.8.0 >> >> locale: >> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > , >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base > > Update to R version 2.10 and associated Bioc packages, or for a (much) > slower solution (you'll want to check that Term and Ontology return ids > in identical order) > > terms = eapply(GOTERM, Term) > > etc. I have > >> sessionInfo() > R version 2.10.1 Patched (2010-02-23 r51168) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GO.db_2.3.5 RSQLite_0.7-3 DBI_0.2-4 > [4] AnnotationDbi_1.8.1 Biobase_2.6.1 > > loaded via a namespace (and not attached): > [1] tools_2.10.1 > > > Martin > >> >>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>> Date: Sun, 28 Feb 2010 18:42:33 -0800 >>> To: Vincent Carey <stvjc at="" channing.harvard.edu=""> >>> Cc: Loren Engrav <engrav at="" u.washington.edu="">, "bioconductor at stat.math.ethz.ch" >>> <bioconductor at="" stat.math.ethz.ch=""> >>> Subject: Re: [BioC] GO's to gene's >>> >>> On 02/28/2010 06:14 PM, Vincent Carey wrote: >>>> Perhaps there is a package with such functionality. However, with the >>>> GO.db package in place, you need to do a little >>>> programming, perhaps along the lines of >>>> >>>> querGO = function(str, attr = "definition", ont = "MF") { >>>> require(GO.db, quietly = TRUE) >>>> gc = GO_dbconn() >>>> quer.1 = paste("select go_id, term from go_term where", >>>> attr, "like('%") >>>> quer.2 = "%') and ontology = '" >>>> quer.3 = "'" >>>> quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "", >>>> sep = "") >>>> dbGetQuery(gc, quer) >>>> } >>>> >>>> whereby >>>> >>>>> querGO("collagen", "term") >>>> go_id term >>>> 1 GO:0004656 procollagen-proline 4-dioxygenase activity >>>> 2 GO:0005518 collagen binding >>>> 3 GO:0008475 procollagen-lysine 5-dioxygenase activity >>>> 4 GO:0019797 procollagen-proline 3-dioxygenase activity >>>> 5 GO:0019798 procollagen-proline dioxygenase activity >>>> 6 GO:0033823 procollagen glucosyltransferase activity >>>> 7 GO:0042329 structural constituent of collagen and cuticulin- based cuticle >>>> 8 GO:0050211 procollagen galactosyltransferase activity >>>> 9 GO:0070052 collagen V binding >>>>> >>> >>> Also >>> >>> library(GO.db) >>> terms <- Term(GOTERM) # or maybe Definition(GOTERM) ? >>> ontologies <- Ontology(GOTERM) >>> collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)] >>> >>> and the next step, >>> >>> library(org.Hs.eg.db) >>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) >>> egids <- egids[!is.na(egids)] >>> >>> >>>> >>>> On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav <engrav at="" u.washington.edu=""> >>>> wrote: >>>>> Is there a BioC package that will find all the GO terms containing some >>>>> word, like perhaps ?collagen? >>>>> And then find all the genes contained within those found terms >>>>> >>>>> I scanned >>>>> GoProfiles >>>>> GOSemSim >>>>> GOstats >>>>> GoTools and >>>>> TopGO >>>>> >>>>> And could not determine that any would do that. >>>>> >>>>> Thank you. >>>>> >>>>> >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> -- >>> Martin Morgan >>> Computational Biology / Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N. >>> PO Box 19024 Seattle, WA 98109 >>> >>> Location: Arnold Building M1 B861 >>> Phone: (206) 667-2793 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
Oopps, Amigo says there are 20 such terms, not 68 as I said before, cuz I retrieved only BP > From: Loren Engrav <engrav at="" u.washington.edu=""> > Date: Sun, 28 Feb 2010 20:28:17 -0800 > To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > Conversation: [BioC] GO's to gene's > Subject: Re: [BioC] GO's to gene's > > Ok thank you > I now show >> sessionInfo() > R version 2.10.1 (2009-12-14) > i386-apple-darwin9.8.0 > > locale: > [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] org.Hs.eg.db_2.3.6 GO.db_2.3.5 RSQLite_0.8-3 > AnnotationDbi_1.8.1 DBI_0.2-5 > [6] Biobase_2.6.1 > > loaded via a namespace (and not attached): > [1] tools_2.10.1 > > And all commands pass with no errors, however I see > >> egids > $`GO:0010711` > IEP > "1471" > > $`GO:0030199` > IEA IEA ISS IEA IMP IMP IMP IMP NAS > IMP NAS IMP ISS > "302" "304" "538" "871" "1277" "1278" "1280" "1281" "1281" > "1289" "1289" "1290" "1290" > NAS IDA NAS IEA IEA IEA IEA IEA NAS > ISS IDA ISS NAS > "1301" "1302" "1303" "1805" "2296" "2303" "4010" "4015" "4060" > "4763" "7042" "7046" "7373" > NAS NAS > "9508" "50509" > > $`GO:0030574` > IEA IEA IEA IEA IEA IEA IEA IEA > IEA IEA IEA > "4312" "4313" "4314" "4316" "4317" "4318" "4319" "4320" > "4322" "4325" "4327" > IEA IDA IMP NAS IEA NAS IEA IEA > IEA IEA > "5184" "5645" "5645" "5653" "5657" "9508" "9509" "56547" > "64066" "140766" > > $`GO:0032963` > IEA IMP > "3091" "7148" > > $`GO:0032964` > IEA IMP IMP TAS IMP > "871" "1277" "1281" "1281" "1289" > > $`GO:0032966` > IDA IC > "3569" "4261" > > $`GO:0032967` > ISS IDA IDA IC IMP TAS IMP > "265" "2147" "2149" "3066" "7040" "7040" "7043" > > $`GO:0033342` > IMP > "23560" > > So many GO terms containing the word "collagen" are not listed, like > 0004656 > 0005518 > etc > Amigo claims there are 68 such terms and the list above has only 8 > What did I do wrong? > Also I would like to omit the IEA group > > Thank you > > > > > > >> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >> Date: Sun, 28 Feb 2010 19:30:34 -0800 >> To: Loren Engrav <engrav at="" u.washington.edu=""> >> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >> Subject: Re: [BioC] GO's to gene's >> >> On 02/28/2010 07:17 PM, Loren Engrav wrote: >>> Thank you both >>> Given my skills, it might be easier/quicker to do it "manually" with Amigo >>> But I am trying both methods >>> >>> For the second method I get >>> >>>> library(GO.db) >>> Loading required package: AnnotationDbi >>> Loading required package: Biobase >>> >>> Welcome to Bioconductor >>> >>> Vignettes contain introductory material. To view, type >>> 'openVignette()'. To cite Bioconductor, see >>> 'citation("Biobase")' and for packages 'citation(pkgname)'. >>> >>> Loading required package: DBI >>>> terms <- Term(GOTERM) >>> Error in function (classes, fdef, mtable) : >>> unable to find an inherited method for function "Term", for signature >>> "GOTermsAnnDbBimap" >>> >>>> sessionInfo() >>> R version 2.9.2 Patched (2009-09-05 r49613) >>> i386-apple-darwin9.8.0 >>> >>> locale: >>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >> , >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >> >> Update to R version 2.10 and associated Bioc packages, or for a (much) >> slower solution (you'll want to check that Term and Ontology return ids >> in identical order) >> >> terms = eapply(GOTERM, Term) >> >> etc. I have >> >>> sessionInfo() >> R version 2.10.1 Patched (2010-02-23 r51168) >> x86_64-unknown-linux-gnu >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] GO.db_2.3.5 RSQLite_0.7-3 DBI_0.2-4 >> [4] AnnotationDbi_1.8.1 Biobase_2.6.1 >> >> loaded via a namespace (and not attached): >> [1] tools_2.10.1 >> >> >> Martin >> >>> >>>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>>> Date: Sun, 28 Feb 2010 18:42:33 -0800 >>>> To: Vincent Carey <stvjc at="" channing.harvard.edu=""> >>>> Cc: Loren Engrav <engrav at="" u.washington.edu="">, >>>> "bioconductor at stat.math.ethz.ch" >>>> <bioconductor at="" stat.math.ethz.ch=""> >>>> Subject: Re: [BioC] GO's to gene's >>>> >>>> On 02/28/2010 06:14 PM, Vincent Carey wrote: >>>>> Perhaps there is a package with such functionality. However, with the >>>>> GO.db package in place, you need to do a little >>>>> programming, perhaps along the lines of >>>>> >>>>> querGO = function(str, attr = "definition", ont = "MF") { >>>>> require(GO.db, quietly = TRUE) >>>>> gc = GO_dbconn() >>>>> quer.1 = paste("select go_id, term from go_term where", >>>>> attr, "like('%") >>>>> quer.2 = "%') and ontology = '" >>>>> quer.3 = "'" >>>>> quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "", >>>>> sep = "") >>>>> dbGetQuery(gc, quer) >>>>> } >>>>> >>>>> whereby >>>>> >>>>>> querGO("collagen", "term") >>>>> go_id >>>>> term >>>>> 1 GO:0004656 procollagen-proline 4-dioxygenase >>>>> activity >>>>> 2 GO:0005518 collagen >>>>> binding >>>>> 3 GO:0008475 procollagen-lysine 5-dioxygenase >>>>> activity >>>>> 4 GO:0019797 procollagen-proline 3-dioxygenase >>>>> activity >>>>> 5 GO:0019798 procollagen-proline dioxygenase >>>>> activity >>>>> 6 GO:0033823 procollagen glucosyltransferase >>>>> activity >>>>> 7 GO:0042329 structural constituent of collagen and cuticulin- based >>>>> cuticle >>>>> 8 GO:0050211 procollagen galactosyltransferase >>>>> activity >>>>> 9 GO:0070052 collagen V >>>>> binding >>>>>> >>>> >>>> Also >>>> >>>> library(GO.db) >>>> terms <- Term(GOTERM) # or maybe Definition(GOTERM) ? >>>> ontologies <- Ontology(GOTERM) >>>> collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)] >>>> >>>> and the next step, >>>> >>>> library(org.Hs.eg.db) >>>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) >>>> egids <- egids[!is.na(egids)] >>>> >>>> >>>>> >>>>> On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav <engrav at="" u.washington.edu=""> >>>>> wrote: >>>>>> Is there a BioC package that will find all the GO terms containing some >>>>>> word, like perhaps ?collagen? >>>>>> And then find all the genes contained within those found terms >>>>>> >>>>>> I scanned >>>>>> GoProfiles >>>>>> GOSemSim >>>>>> GOstats >>>>>> GoTools and >>>>>> TopGO >>>>>> >>>>>> And could not determine that any would do that. >>>>>> >>>>>> Thank you. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at stat.math.ethz.ch >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> -- >>>> Martin Morgan >>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N. >>>> PO Box 19024 Seattle, WA 98109 >>>> >>>> Location: Arnold Building M1 B861 >>>> Phone: (206) 667-2793 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> Martin Morgan >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
So I checked > collagen And this list matches Amigo So then would appear the issue lies in > egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) Some of the names are finding no associated genes in org.Hs.egGO2EG and so appear as NA True? Possible? My version of org.Hs.egGO2EG is 2.3.6 > From: Loren Engrav <engrav at="" u.washington.edu=""> > Date: Sun, 28 Feb 2010 20:33:05 -0800 > To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > Conversation: [BioC] GO's to gene's > Subject: Re: [BioC] GO's to gene's > > Oopps, Amigo says there are 20 such terms, not 68 as I said before, cuz I > retrieved only BP > > >> From: Loren Engrav <engrav at="" u.washington.edu=""> >> Date: Sun, 28 Feb 2010 20:28:17 -0800 >> To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >> Conversation: [BioC] GO's to gene's >> Subject: Re: [BioC] GO's to gene's >> >> Ok thank you >> I now show >>> sessionInfo() >> R version 2.10.1 (2009-12-14) >> i386-apple-darwin9.8.0 >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] org.Hs.eg.db_2.3.6 GO.db_2.3.5 RSQLite_0.8-3 >> AnnotationDbi_1.8.1 DBI_0.2-5 >> [6] Biobase_2.6.1 >> >> loaded via a namespace (and not attached): >> [1] tools_2.10.1 >> >> And all commands pass with no errors, however I see >> >>> egids >> $`GO:0010711` >> IEP >> "1471" >> >> $`GO:0030199` >> IEA IEA ISS IEA IMP IMP IMP IMP NAS >> IMP NAS IMP ISS >> "302" "304" "538" "871" "1277" "1278" "1280" "1281" "1281" >> "1289" "1289" "1290" "1290" >> NAS IDA NAS IEA IEA IEA IEA IEA NAS >> ISS IDA ISS NAS >> "1301" "1302" "1303" "1805" "2296" "2303" "4010" "4015" "4060" >> "4763" "7042" "7046" "7373" >> NAS NAS >> "9508" "50509" >> >> $`GO:0030574` >> IEA IEA IEA IEA IEA IEA IEA IEA >> IEA IEA IEA >> "4312" "4313" "4314" "4316" "4317" "4318" "4319" "4320" >> "4322" "4325" "4327" >> IEA IDA IMP NAS IEA NAS IEA IEA >> IEA IEA >> "5184" "5645" "5645" "5653" "5657" "9508" "9509" "56547" >> "64066" "140766" >> >> $`GO:0032963` >> IEA IMP >> "3091" "7148" >> >> $`GO:0032964` >> IEA IMP IMP TAS IMP >> "871" "1277" "1281" "1281" "1289" >> >> $`GO:0032966` >> IDA IC >> "3569" "4261" >> >> $`GO:0032967` >> ISS IDA IDA IC IMP TAS IMP >> "265" "2147" "2149" "3066" "7040" "7040" "7043" >> >> $`GO:0033342` >> IMP >> "23560" >> >> So many GO terms containing the word "collagen" are not listed, like >> 0004656 >> 0005518 >> etc >> Amigo claims there are 68 such terms and the list above has only 8 >> What did I do wrong? >> Also I would like to omit the IEA group >> >> Thank you >> >> >> >> >> >> >>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>> Date: Sun, 28 Feb 2010 19:30:34 -0800 >>> To: Loren Engrav <engrav at="" u.washington.edu=""> >>> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>> Subject: Re: [BioC] GO's to gene's >>> >>> On 02/28/2010 07:17 PM, Loren Engrav wrote: >>>> Thank you both >>>> Given my skills, it might be easier/quicker to do it "manually" with Amigo >>>> But I am trying both methods >>>> >>>> For the second method I get >>>> >>>>> library(GO.db) >>>> Loading required package: AnnotationDbi >>>> Loading required package: Biobase >>>> >>>> Welcome to Bioconductor >>>> >>>> Vignettes contain introductory material. To view, type >>>> 'openVignette()'. To cite Bioconductor, see >>>> 'citation("Biobase")' and for packages 'citation(pkgname)'. >>>> >>>> Loading required package: DBI >>>>> terms <- Term(GOTERM) >>>> Error in function (classes, fdef, mtable) : >>>> unable to find an inherited method for function "Term", for signature >>>> "GOTermsAnnDbBimap" >>>> >>>>> sessionInfo() >>>> R version 2.9.2 Patched (2009-09-05 r49613) >>>> i386-apple-darwin9.8.0 >>>> >>>> locale: >>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>> , >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>> >>> Update to R version 2.10 and associated Bioc packages, or for a (much) >>> slower solution (you'll want to check that Term and Ontology return ids >>> in identical order) >>> >>> terms = eapply(GOTERM, Term) >>> >>> etc. I have >>> >>>> sessionInfo() >>> R version 2.10.1 Patched (2010-02-23 r51168) >>> x86_64-unknown-linux-gnu >>> >>> locale: >>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] GO.db_2.3.5 RSQLite_0.7-3 DBI_0.2-4 >>> [4] AnnotationDbi_1.8.1 Biobase_2.6.1 >>> >>> loaded via a namespace (and not attached): >>> [1] tools_2.10.1 >>> >>> >>> Martin >>> >>>> >>>>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>>>> Date: Sun, 28 Feb 2010 18:42:33 -0800 >>>>> To: Vincent Carey <stvjc at="" channing.harvard.edu=""> >>>>> Cc: Loren Engrav <engrav at="" u.washington.edu="">, >>>>> "bioconductor at stat.math.ethz.ch" >>>>> <bioconductor at="" stat.math.ethz.ch=""> >>>>> Subject: Re: [BioC] GO's to gene's >>>>> >>>>> On 02/28/2010 06:14 PM, Vincent Carey wrote: >>>>>> Perhaps there is a package with such functionality. However, with the >>>>>> GO.db package in place, you need to do a little >>>>>> programming, perhaps along the lines of >>>>>> >>>>>> querGO = function(str, attr = "definition", ont = "MF") { >>>>>> require(GO.db, quietly = TRUE) >>>>>> gc = GO_dbconn() >>>>>> quer.1 = paste("select go_id, term from go_term where", >>>>>> attr, "like('%") >>>>>> quer.2 = "%') and ontology = '" >>>>>> quer.3 = "'" >>>>>> quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "", >>>>>> sep = "") >>>>>> dbGetQuery(gc, quer) >>>>>> } >>>>>> >>>>>> whereby >>>>>> >>>>>>> querGO("collagen", "term") >>>>>> go_id >>>>>> term >>>>>> 1 GO:0004656 procollagen-proline 4-dioxygenase >>>>>> activity >>>>>> 2 GO:0005518 collagen >>>>>> binding >>>>>> 3 GO:0008475 procollagen-lysine 5-dioxygenase >>>>>> activity >>>>>> 4 GO:0019797 procollagen-proline 3-dioxygenase >>>>>> activity >>>>>> 5 GO:0019798 procollagen-proline dioxygenase >>>>>> activity >>>>>> 6 GO:0033823 procollagen glucosyltransferase >>>>>> activity >>>>>> 7 GO:0042329 structural constituent of collagen and cuticulin- based >>>>>> cuticle >>>>>> 8 GO:0050211 procollagen galactosyltransferase >>>>>> activity >>>>>> 9 GO:0070052 collagen V >>>>>> binding >>>>>>> >>>>> >>>>> Also >>>>> >>>>> library(GO.db) >>>>> terms <- Term(GOTERM) # or maybe Definition(GOTERM) ? >>>>> ontologies <- Ontology(GOTERM) >>>>> collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)] >>>>> >>>>> and the next step, >>>>> >>>>> library(org.Hs.eg.db) >>>>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) >>>>> egids <- egids[!is.na(egids)] >>>>> >>>>> >>>>>> >>>>>> On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav <engrav at="" u.washington.edu=""> >>>>>> wrote: >>>>>>> Is there a BioC package that will find all the GO terms containing some >>>>>>> word, like perhaps ?collagen? >>>>>>> And then find all the genes contained within those found terms >>>>>>> >>>>>>> I scanned >>>>>>> GoProfiles >>>>>>> GOSemSim >>>>>>> GOstats >>>>>>> GoTools and >>>>>>> TopGO >>>>>>> >>>>>>> And could not determine that any would do that. >>>>>>> >>>>>>> Thank you. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: >>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at stat.math.ethz.ch >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> >>>>> -- >>>>> Martin Morgan >>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>> 1100 Fairview Ave. N. >>>>> PO Box 19024 Seattle, WA 98109 >>>>> >>>>> Location: Arnold Building M1 B861 >>>>> Phone: (206) 667-2793 >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> -- >>> Martin Morgan >>> Computational Biology / Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N. >>> PO Box 19024 Seattle, WA 98109 >>> >>> Location: Arnold Building M1 B861 >>> Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
On 02/28/2010 09:01 PM, Loren Engrav wrote: > So I checked >> collagen > And this list matches Amigo > So then would appear the issue lies in >> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) > Some of the names are finding no associated genes in org.Hs.egGO2EG and so > appear as NA > True? Possible? yes. GO is not H. sapiens specific and ENTREZ ids are not 100% comprehensive, so some GO terms do not map to ENTREZ ids. >>> Also I would like to omit the IEA group maybe egids <- lapply(egids, function(elt) elt[names(elt) != "IEA"]) egids[sapply(egids, length) != 0] Martin > My version of org.Hs.egGO2EG is 2.3.6 > > > > > >> From: Loren Engrav <engrav at="" u.washington.edu=""> >> Date: Sun, 28 Feb 2010 20:33:05 -0800 >> To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >> Conversation: [BioC] GO's to gene's >> Subject: Re: [BioC] GO's to gene's >> >> Oopps, Amigo says there are 20 such terms, not 68 as I said before, cuz I >> retrieved only BP >> >> >>> From: Loren Engrav <engrav at="" u.washington.edu=""> >>> Date: Sun, 28 Feb 2010 20:28:17 -0800 >>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>> Conversation: [BioC] GO's to gene's >>> Subject: Re: [BioC] GO's to gene's >>> >>> Ok thank you >>> I now show >>>> sessionInfo() >>> R version 2.10.1 (2009-12-14) >>> i386-apple-darwin9.8.0 >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] org.Hs.eg.db_2.3.6 GO.db_2.3.5 RSQLite_0.8-3 >>> AnnotationDbi_1.8.1 DBI_0.2-5 >>> [6] Biobase_2.6.1 >>> >>> loaded via a namespace (and not attached): >>> [1] tools_2.10.1 >>> >>> And all commands pass with no errors, however I see >>> >>>> egids >>> $`GO:0010711` >>> IEP >>> "1471" >>> >>> $`GO:0030199` >>> IEA IEA ISS IEA IMP IMP IMP IMP NAS >>> IMP NAS IMP ISS >>> "302" "304" "538" "871" "1277" "1278" "1280" "1281" "1281" >>> "1289" "1289" "1290" "1290" >>> NAS IDA NAS IEA IEA IEA IEA IEA NAS >>> ISS IDA ISS NAS >>> "1301" "1302" "1303" "1805" "2296" "2303" "4010" "4015" "4060" >>> "4763" "7042" "7046" "7373" >>> NAS NAS >>> "9508" "50509" >>> >>> $`GO:0030574` >>> IEA IEA IEA IEA IEA IEA IEA IEA >>> IEA IEA IEA >>> "4312" "4313" "4314" "4316" "4317" "4318" "4319" "4320" >>> "4322" "4325" "4327" >>> IEA IDA IMP NAS IEA NAS IEA IEA >>> IEA IEA >>> "5184" "5645" "5645" "5653" "5657" "9508" "9509" "56547" >>> "64066" "140766" >>> >>> $`GO:0032963` >>> IEA IMP >>> "3091" "7148" >>> >>> $`GO:0032964` >>> IEA IMP IMP TAS IMP >>> "871" "1277" "1281" "1281" "1289" >>> >>> $`GO:0032966` >>> IDA IC >>> "3569" "4261" >>> >>> $`GO:0032967` >>> ISS IDA IDA IC IMP TAS IMP >>> "265" "2147" "2149" "3066" "7040" "7040" "7043" >>> >>> $`GO:0033342` >>> IMP >>> "23560" >>> >>> So many GO terms containing the word "collagen" are not listed, like >>> 0004656 >>> 0005518 >>> etc >>> Amigo claims there are 68 such terms and the list above has only 8 >>> What did I do wrong? >>> Also I would like to omit the IEA group >>> >>> Thank you >>> >>> >>> >>> >>> >>> >>>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>>> Date: Sun, 28 Feb 2010 19:30:34 -0800 >>>> To: Loren Engrav <engrav at="" u.washington.edu=""> >>>> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>> Subject: Re: [BioC] GO's to gene's >>>> >>>> On 02/28/2010 07:17 PM, Loren Engrav wrote: >>>>> Thank you both >>>>> Given my skills, it might be easier/quicker to do it "manually" with Amigo >>>>> But I am trying both methods >>>>> >>>>> For the second method I get >>>>> >>>>>> library(GO.db) >>>>> Loading required package: AnnotationDbi >>>>> Loading required package: Biobase >>>>> >>>>> Welcome to Bioconductor >>>>> >>>>> Vignettes contain introductory material. To view, type >>>>> 'openVignette()'. To cite Bioconductor, see >>>>> 'citation("Biobase")' and for packages 'citation(pkgname)'. >>>>> >>>>> Loading required package: DBI >>>>>> terms <- Term(GOTERM) >>>>> Error in function (classes, fdef, mtable) : >>>>> unable to find an inherited method for function "Term", for signature >>>>> "GOTermsAnnDbBimap" >>>>> >>>>>> sessionInfo() >>>>> R version 2.9.2 Patched (2009-09-05 r49613) >>>>> i386-apple-darwin9.8.0 >>>>> >>>>> locale: >>>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>> , >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> Update to R version 2.10 and associated Bioc packages, or for a (much) >>>> slower solution (you'll want to check that Term and Ontology return ids >>>> in identical order) >>>> >>>> terms = eapply(GOTERM, Term) >>>> >>>> etc. I have >>>> >>>>> sessionInfo() >>>> R version 2.10.1 Patched (2010-02-23 r51168) >>>> x86_64-unknown-linux-gnu >>>> >>>> locale: >>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] GO.db_2.3.5 RSQLite_0.7-3 DBI_0.2-4 >>>> [4] AnnotationDbi_1.8.1 Biobase_2.6.1 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] tools_2.10.1 >>>> >>>> >>>> Martin >>>> >>>>> >>>>>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>>>>> Date: Sun, 28 Feb 2010 18:42:33 -0800 >>>>>> To: Vincent Carey <stvjc at="" channing.harvard.edu=""> >>>>>> Cc: Loren Engrav <engrav at="" u.washington.edu="">, >>>>>> "bioconductor at stat.math.ethz.ch" >>>>>> <bioconductor at="" stat.math.ethz.ch=""> >>>>>> Subject: Re: [BioC] GO's to gene's >>>>>> >>>>>> On 02/28/2010 06:14 PM, Vincent Carey wrote: >>>>>>> Perhaps there is a package with such functionality. However, with the >>>>>>> GO.db package in place, you need to do a little >>>>>>> programming, perhaps along the lines of >>>>>>> >>>>>>> querGO = function(str, attr = "definition", ont = "MF") { >>>>>>> require(GO.db, quietly = TRUE) >>>>>>> gc = GO_dbconn() >>>>>>> quer.1 = paste("select go_id, term from go_term where", >>>>>>> attr, "like('%") >>>>>>> quer.2 = "%') and ontology = '" >>>>>>> quer.3 = "'" >>>>>>> quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "", >>>>>>> sep = "") >>>>>>> dbGetQuery(gc, quer) >>>>>>> } >>>>>>> >>>>>>> whereby >>>>>>> >>>>>>>> querGO("collagen", "term") >>>>>>> go_id >>>>>>> term >>>>>>> 1 GO:0004656 procollagen-proline 4-dioxygenase >>>>>>> activity >>>>>>> 2 GO:0005518 collagen >>>>>>> binding >>>>>>> 3 GO:0008475 procollagen-lysine 5-dioxygenase >>>>>>> activity >>>>>>> 4 GO:0019797 procollagen-proline 3-dioxygenase >>>>>>> activity >>>>>>> 5 GO:0019798 procollagen-proline dioxygenase >>>>>>> activity >>>>>>> 6 GO:0033823 procollagen glucosyltransferase >>>>>>> activity >>>>>>> 7 GO:0042329 structural constituent of collagen and cuticulin- based >>>>>>> cuticle >>>>>>> 8 GO:0050211 procollagen galactosyltransferase >>>>>>> activity >>>>>>> 9 GO:0070052 collagen V >>>>>>> binding >>>>>>>> >>>>>> >>>>>> Also >>>>>> >>>>>> library(GO.db) >>>>>> terms <- Term(GOTERM) # or maybe Definition(GOTERM) ? >>>>>> ontologies <- Ontology(GOTERM) >>>>>> collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)] >>>>>> >>>>>> and the next step, >>>>>> >>>>>> library(org.Hs.eg.db) >>>>>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) >>>>>> egids <- egids[!is.na(egids)] >>>>>> >>>>>> >>>>>>> >>>>>>> On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav <engrav at="" u.washington.edu=""> >>>>>>> wrote: >>>>>>>> Is there a BioC package that will find all the GO terms containing some >>>>>>>> word, like perhaps ?collagen? >>>>>>>> And then find all the genes contained within those found terms >>>>>>>> >>>>>>>> I scanned >>>>>>>> GoProfiles >>>>>>>> GOSemSim >>>>>>>> GOstats >>>>>>>> GoTools and >>>>>>>> TopGO >>>>>>>> >>>>>>>> And could not determine that any would do that. >>>>>>>> >>>>>>>> Thank you. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> [[alternative HTML version deleted]] >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioconductor mailing list >>>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>> Search the archives: >>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: >>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>>> >>>>>> -- >>>>>> Martin Morgan >>>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>>> 1100 Fairview Ave. N. >>>>>> PO Box 19024 Seattle, WA 98109 >>>>>> >>>>>> Location: Arnold Building M1 B861 >>>>>> Phone: (206) 667-2793 >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> -- >>>> Martin Morgan >>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N. >>>> PO Box 19024 Seattle, WA 98109 >>>> >>>> Location: Arnold Building M1 B861 >>>> Phone: (206) 667-2793 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
Thank you You are clearly very good at this So to check it all out I did it manually on Amigo. Amigo found 33 genes (limited to Human and omitting IEA) The org.HS.eg.db method found 29 of the 33 but did not find CST3 (1471) GO:0010711 IEP HIF1A (3091) GO:0032963 ISS IL6R (3570), GO:0032966 IDA and TRAM2 (9697) GO:0032964 IMP I suppose to figure out, for example, why org.Hs.eg.db does not map 9697 to GO:0032964 is complex Thank you > From: Martin Morgan <mtmorgan at="" fhcrc.org=""> > Date: Mon, 01 Mar 2010 05:16:48 -0800 > To: Loren Engrav <engrav at="" u.washington.edu=""> > Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > Subject: Re: [BioC] GO's to gene's > > On 02/28/2010 09:01 PM, Loren Engrav wrote: >> So I checked >>> collagen >> And this list matches Amigo >> So then would appear the issue lies in >>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) >> Some of the names are finding no associated genes in org.Hs.egGO2EG and so >> appear as NA >> True? Possible? > > yes. GO is not H. sapiens specific and ENTREZ ids are not 100% > comprehensive, so some GO terms do not map to ENTREZ ids. > >>>> Also I would like to omit the IEA group > > maybe > > egids <- lapply(egids, function(elt) elt[names(elt) != "IEA"]) > egids[sapply(egids, length) != 0] > > Martin > >> My version of org.Hs.egGO2EG is 2.3.6 >> >> >> >> >> >>> From: Loren Engrav <engrav at="" u.washington.edu=""> >>> Date: Sun, 28 Feb 2010 20:33:05 -0800 >>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>> Conversation: [BioC] GO's to gene's >>> Subject: Re: [BioC] GO's to gene's >>> >>> Oopps, Amigo says there are 20 such terms, not 68 as I said before, cuz I >>> retrieved only BP >>> >>> >>>> From: Loren Engrav <engrav at="" u.washington.edu=""> >>>> Date: Sun, 28 Feb 2010 20:28:17 -0800 >>>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>> Conversation: [BioC] GO's to gene's >>>> Subject: Re: [BioC] GO's to gene's >>>> >>>> Ok thank you >>>> I now show >>>>> sessionInfo() >>>> R version 2.10.1 (2009-12-14) >>>> i386-apple-darwin9.8.0 >>>> >>>> locale: >>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>> >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods base >>>> >>>> other attached packages: >>>> [1] org.Hs.eg.db_2.3.6 GO.db_2.3.5 RSQLite_0.8-3 >>>> AnnotationDbi_1.8.1 DBI_0.2-5 >>>> [6] Biobase_2.6.1 >>>> >>>> loaded via a namespace (and not attached): >>>> [1] tools_2.10.1 >>>> >>>> And all commands pass with no errors, however I see >>>> >>>>> egids >>>> $`GO:0010711` >>>> IEP >>>> "1471" >>>> >>>> $`GO:0030199` >>>> IEA IEA ISS IEA IMP IMP IMP IMP NAS >>>> IMP NAS IMP ISS >>>> "302" "304" "538" "871" "1277" "1278" "1280" "1281" "1281" >>>> "1289" "1289" "1290" "1290" >>>> NAS IDA NAS IEA IEA IEA IEA IEA NAS >>>> ISS IDA ISS NAS >>>> "1301" "1302" "1303" "1805" "2296" "2303" "4010" "4015" "4060" >>>> "4763" "7042" "7046" "7373" >>>> NAS NAS >>>> "9508" "50509" >>>> >>>> $`GO:0030574` >>>> IEA IEA IEA IEA IEA IEA IEA IEA >>>> IEA IEA IEA >>>> "4312" "4313" "4314" "4316" "4317" "4318" "4319" "4320" >>>> "4322" "4325" "4327" >>>> IEA IDA IMP NAS IEA NAS IEA IEA >>>> IEA IEA >>>> "5184" "5645" "5645" "5653" "5657" "9508" "9509" "56547" >>>> "64066" "140766" >>>> >>>> $`GO:0032963` >>>> IEA IMP >>>> "3091" "7148" >>>> >>>> $`GO:0032964` >>>> IEA IMP IMP TAS IMP >>>> "871" "1277" "1281" "1281" "1289" >>>> >>>> $`GO:0032966` >>>> IDA IC >>>> "3569" "4261" >>>> >>>> $`GO:0032967` >>>> ISS IDA IDA IC IMP TAS IMP >>>> "265" "2147" "2149" "3066" "7040" "7040" "7043" >>>> >>>> $`GO:0033342` >>>> IMP >>>> "23560" >>>> >>>> So many GO terms containing the word "collagen" are not listed, like >>>> 0004656 >>>> 0005518 >>>> etc >>>> Amigo claims there are 68 such terms and the list above has only 8 >>>> What did I do wrong? >>>> Also I would like to omit the IEA group >>>> >>>> Thank you >>>> >>>> >>>> >>>> >>>> >>>> >>>>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>>>> Date: Sun, 28 Feb 2010 19:30:34 -0800 >>>>> To: Loren Engrav <engrav at="" u.washington.edu=""> >>>>> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>>> Subject: Re: [BioC] GO's to gene's >>>>> >>>>> On 02/28/2010 07:17 PM, Loren Engrav wrote: >>>>>> Thank you both >>>>>> Given my skills, it might be easier/quicker to do it "manually" with >>>>>> Amigo >>>>>> But I am trying both methods >>>>>> >>>>>> For the second method I get >>>>>> >>>>>>> library(GO.db) >>>>>> Loading required package: AnnotationDbi >>>>>> Loading required package: Biobase >>>>>> >>>>>> Welcome to Bioconductor >>>>>> >>>>>> Vignettes contain introductory material. To view, type >>>>>> 'openVignette()'. To cite Bioconductor, see >>>>>> 'citation("Biobase")' and for packages 'citation(pkgname)'. >>>>>> >>>>>> Loading required package: DBI >>>>>>> terms <- Term(GOTERM) >>>>>> Error in function (classes, fdef, mtable) : >>>>>> unable to find an inherited method for function "Term", for signature >>>>>> "GOTermsAnnDbBimap" >>>>>> >>>>>>> sessionInfo() >>>>>> R version 2.9.2 Patched (2009-09-05 r49613) >>>>>> i386-apple-darwin9.8.0 >>>>>> >>>>>> locale: >>>>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>>> , >>>>>> attached base packages: >>>>>> [1] stats graphics grDevices utils datasets methods base >>>>> >>>>> Update to R version 2.10 and associated Bioc packages, or for a (much) >>>>> slower solution (you'll want to check that Term and Ontology return ids >>>>> in identical order) >>>>> >>>>> terms = eapply(GOTERM, Term) >>>>> >>>>> etc. I have >>>>> >>>>>> sessionInfo() >>>>> R version 2.10.1 Patched (2010-02-23 r51168) >>>>> x86_64-unknown-linux-gnu >>>>> >>>>> locale: >>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] GO.db_2.3.5 RSQLite_0.7-3 DBI_0.2-4 >>>>> [4] AnnotationDbi_1.8.1 Biobase_2.6.1 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] tools_2.10.1 >>>>> >>>>> >>>>> Martin >>>>> >>>>>> >>>>>>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>>>>>> Date: Sun, 28 Feb 2010 18:42:33 -0800 >>>>>>> To: Vincent Carey <stvjc at="" channing.harvard.edu=""> >>>>>>> Cc: Loren Engrav <engrav at="" u.washington.edu="">, >>>>>>> "bioconductor at stat.math.ethz.ch" >>>>>>> <bioconductor at="" stat.math.ethz.ch=""> >>>>>>> Subject: Re: [BioC] GO's to gene's >>>>>>> >>>>>>> On 02/28/2010 06:14 PM, Vincent Carey wrote: >>>>>>>> Perhaps there is a package with such functionality. However, with the >>>>>>>> GO.db package in place, you need to do a little >>>>>>>> programming, perhaps along the lines of >>>>>>>> >>>>>>>> querGO = function(str, attr = "definition", ont = "MF") { >>>>>>>> require(GO.db, quietly = TRUE) >>>>>>>> gc = GO_dbconn() >>>>>>>> quer.1 = paste("select go_id, term from go_term where", >>>>>>>> attr, "like('%") >>>>>>>> quer.2 = "%') and ontology = '" >>>>>>>> quer.3 = "'" >>>>>>>> quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "", >>>>>>>> sep = "") >>>>>>>> dbGetQuery(gc, quer) >>>>>>>> } >>>>>>>> >>>>>>>> whereby >>>>>>>> >>>>>>>>> querGO("collagen", "term") >>>>>>>> go_id >>>>>>>> term >>>>>>>> 1 GO:0004656 procollagen-proline 4-dioxygenase >>>>>>>> activity >>>>>>>> 2 GO:0005518 collagen >>>>>>>> binding >>>>>>>> 3 GO:0008475 procollagen-lysine 5-dioxygenase >>>>>>>> activity >>>>>>>> 4 GO:0019797 procollagen-proline 3-dioxygenase >>>>>>>> activity >>>>>>>> 5 GO:0019798 procollagen-proline dioxygenase >>>>>>>> activity >>>>>>>> 6 GO:0033823 procollagen glucosyltransferase >>>>>>>> activity >>>>>>>> 7 GO:0042329 structural constituent of collagen and cuticulin-based >>>>>>>> cuticle >>>>>>>> 8 GO:0050211 procollagen galactosyltransferase >>>>>>>> activity >>>>>>>> 9 GO:0070052 collagen V >>>>>>>> binding >>>>>>>>> >>>>>>> >>>>>>> Also >>>>>>> >>>>>>> library(GO.db) >>>>>>> terms <- Term(GOTERM) # or maybe Definition(GOTERM) ? >>>>>>> ontologies <- Ontology(GOTERM) >>>>>>> collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)] >>>>>>> >>>>>>> and the next step, >>>>>>> >>>>>>> library(org.Hs.eg.db) >>>>>>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) >>>>>>> egids <- egids[!is.na(egids)] >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav <engrav at="" u.washington.edu=""> >>>>>>>> wrote: >>>>>>>>> Is there a BioC package that will find all the GO terms containing >>>>>>>>> some >>>>>>>>> word, like perhaps ?collagen? >>>>>>>>> And then find all the genes contained within those found terms >>>>>>>>> >>>>>>>>> I scanned >>>>>>>>> GoProfiles >>>>>>>>> GOSemSim >>>>>>>>> GOstats >>>>>>>>> GoTools and >>>>>>>>> TopGO >>>>>>>>> >>>>>>>>> And could not determine that any would do that. >>>>>>>>> >>>>>>>>> Thank you. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> [[alternative HTML version deleted]] >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioconductor mailing list >>>>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>> Search the archives: >>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioconductor mailing list >>>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>> Search the archives: >>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Martin Morgan >>>>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>>>> 1100 Fairview Ave. N. >>>>>>> PO Box 19024 Seattle, WA 98109 >>>>>>> >>>>>>> Location: Arnold Building M1 B861 >>>>>>> Phone: (206) 667-2793 >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at stat.math.ethz.ch >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> >>>>> -- >>>>> Martin Morgan >>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>> 1100 Fairview Ave. N. >>>>> PO Box 19024 Seattle, WA 98109 >>>>> >>>>> Location: Arnold Building M1 B861 >>>>> Phone: (206) 667-2793 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
On 03/01/2010 06:34 PM, Loren Engrav wrote: > Thank you > You are clearly very good at this > > So to check it all out I did it manually on Amigo. Amigo found 33 genes > (limited to Human and omitting IEA) > > The org.HS.eg.db method found 29 of the 33 but did not find > CST3 (1471) GO:0010711 IEP > HIF1A (3091) GO:0032963 ISS > IL6R (3570), GO:0032966 IDA and > TRAM2 (9697) GO:0032964 IMP > > I suppose to figure out, for example, why org.Hs.eg.db does not map 9697 to > GO:0032964 is complex > names(org.Hs.egGO[["9697"]]) [1] "GO:0015031" "GO:0065002" "GO:0016020" "GO:0016021" Hmm, what are the offspring / ancestors of GO:0032964 ? > GOBPOFFSPRING[["GO:0032964"]] [1] "GO:0032965" "GO:0032966" "GO:0032967" > GOBPANCESTOR[["GO:0032964"]] [1] "all" "GO:0008152" "GO:0008150" "GO:0009058" "GO:0009059" [6] "GO:0032501" "GO:0032963" "GO:0043170" "GO:0044236" "GO:0044259" Nope nothing jumping out. Where's the GO data coming from? > org.Hs.eg() ## or GO() [snip] Date for GO data: 20090830 Whereas AMIGO says (at the bottom of each page) GO database release 2010-02-27 so that looks like a likely issue that would require some more substantial investigation. Merits of using a 'current' db (Amigo) vs a 'versioned' db (GO.db)? See mailing list archives, e.g., current state-of-knowledge vs. reproducibility (how would we redo the analysis we did last month and get the same results with AMIGO?). On the other hand > org.Hs.egGO2EG[["GO:0010711"]] IEP "1471" > GOTERM[["GO:0010711"]] GOID: GO:0010711 Term: negative regulation of collagen catabolic process Ontology: BP Definition: Any process that decreases the rate, frequency or extent of collagen catabolism. Collagen catabolism is the proteolytic chemical reactions and pathways resulting in the breakdown of collagen in the extracellular matrix. Synonym: down regulation of collagen catabolic process Synonym: down-regulation of collagen catabolic process Synonym: downregulation of collagen catabolic process Synonym: inhibition of collagen catabolic process Synonym: negative regulation of collagen breakdown Synonym: negative regulation of collagen catabolism Synonym: negative regulation of collagen degradation so why didn't we find that one? > terms <- Term(GOTERM) # or maybe Definition(GOTERM) > "GO:0010711" %in% names(terms) [1] TRUE > terms[["GO:0010711"]] [1] "negative regulation of collagen catabolic process" yep it's there > ontologies <- Ontology(GOTERM) > ontologies[["GO:0010711"]] [1] "BP" > collagen <- terms[grepl("collagen", terms) & ("BP" == ontologies)] > collagen[["GO:0010711"]] [1] "negative regulation of collagen catabolic process" yep it's there (or were we looking for MF, as below?). > egids[["GO:0010711"]] IEP "1471" yep it's there. So this makes me think it's a programming error or a miscommunication. I'd suggest you write a little function getGO <- function(termLike, ontology, exludeEvidence) { ## a few lines of code here, representing the query you perform } and perhaps sharing that with the list will shed some light. Martin > > Thank you > > >> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >> Date: Mon, 01 Mar 2010 05:16:48 -0800 >> To: Loren Engrav <engrav at="" u.washington.edu=""> >> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >> Subject: Re: [BioC] GO's to gene's >> >> On 02/28/2010 09:01 PM, Loren Engrav wrote: >>> So I checked >>>> collagen >>> And this list matches Amigo >>> So then would appear the issue lies in >>>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) >>> Some of the names are finding no associated genes in org.Hs.egGO2EG and so >>> appear as NA >>> True? Possible? >> >> yes. GO is not H. sapiens specific and ENTREZ ids are not 100% >> comprehensive, so some GO terms do not map to ENTREZ ids. >> >>>>> Also I would like to omit the IEA group >> >> maybe >> >> egids <- lapply(egids, function(elt) elt[names(elt) != "IEA"]) >> egids[sapply(egids, length) != 0] >> >> Martin >> >>> My version of org.Hs.egGO2EG is 2.3.6 >>> >>> >>> >>> >>> >>>> From: Loren Engrav <engrav at="" u.washington.edu=""> >>>> Date: Sun, 28 Feb 2010 20:33:05 -0800 >>>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>> Conversation: [BioC] GO's to gene's >>>> Subject: Re: [BioC] GO's to gene's >>>> >>>> Oopps, Amigo says there are 20 such terms, not 68 as I said before, cuz I >>>> retrieved only BP >>>> >>>> >>>>> From: Loren Engrav <engrav at="" u.washington.edu=""> >>>>> Date: Sun, 28 Feb 2010 20:28:17 -0800 >>>>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>>> Conversation: [BioC] GO's to gene's >>>>> Subject: Re: [BioC] GO's to gene's >>>>> >>>>> Ok thank you >>>>> I now show >>>>>> sessionInfo() >>>>> R version 2.10.1 (2009-12-14) >>>>> i386-apple-darwin9.8.0 >>>>> >>>>> locale: >>>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>>> >>>>> attached base packages: >>>>> [1] stats graphics grDevices utils datasets methods base >>>>> >>>>> other attached packages: >>>>> [1] org.Hs.eg.db_2.3.6 GO.db_2.3.5 RSQLite_0.8-3 >>>>> AnnotationDbi_1.8.1 DBI_0.2-5 >>>>> [6] Biobase_2.6.1 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] tools_2.10.1 >>>>> >>>>> And all commands pass with no errors, however I see >>>>> >>>>>> egids >>>>> $`GO:0010711` >>>>> IEP >>>>> "1471" >>>>> >>>>> $`GO:0030199` >>>>> IEA IEA ISS IEA IMP IMP IMP IMP NAS >>>>> IMP NAS IMP ISS >>>>> "302" "304" "538" "871" "1277" "1278" "1280" "1281" "1281" >>>>> "1289" "1289" "1290" "1290" >>>>> NAS IDA NAS IEA IEA IEA IEA IEA NAS >>>>> ISS IDA ISS NAS >>>>> "1301" "1302" "1303" "1805" "2296" "2303" "4010" "4015" "4060" >>>>> "4763" "7042" "7046" "7373" >>>>> NAS NAS >>>>> "9508" "50509" >>>>> >>>>> $`GO:0030574` >>>>> IEA IEA IEA IEA IEA IEA IEA IEA >>>>> IEA IEA IEA >>>>> "4312" "4313" "4314" "4316" "4317" "4318" "4319" "4320" >>>>> "4322" "4325" "4327" >>>>> IEA IDA IMP NAS IEA NAS IEA IEA >>>>> IEA IEA >>>>> "5184" "5645" "5645" "5653" "5657" "9508" "9509" "56547" >>>>> "64066" "140766" >>>>> >>>>> $`GO:0032963` >>>>> IEA IMP >>>>> "3091" "7148" >>>>> >>>>> $`GO:0032964` >>>>> IEA IMP IMP TAS IMP >>>>> "871" "1277" "1281" "1281" "1289" >>>>> >>>>> $`GO:0032966` >>>>> IDA IC >>>>> "3569" "4261" >>>>> >>>>> $`GO:0032967` >>>>> ISS IDA IDA IC IMP TAS IMP >>>>> "265" "2147" "2149" "3066" "7040" "7040" "7043" >>>>> >>>>> $`GO:0033342` >>>>> IMP >>>>> "23560" >>>>> >>>>> So many GO terms containing the word "collagen" are not listed, like >>>>> 0004656 >>>>> 0005518 >>>>> etc >>>>> Amigo claims there are 68 such terms and the list above has only 8 >>>>> What did I do wrong? >>>>> Also I would like to omit the IEA group >>>>> >>>>> Thank you >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>>>>> Date: Sun, 28 Feb 2010 19:30:34 -0800 >>>>>> To: Loren Engrav <engrav at="" u.washington.edu=""> >>>>>> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>>>> Subject: Re: [BioC] GO's to gene's >>>>>> >>>>>> On 02/28/2010 07:17 PM, Loren Engrav wrote: >>>>>>> Thank you both >>>>>>> Given my skills, it might be easier/quicker to do it "manually" with >>>>>>> Amigo >>>>>>> But I am trying both methods >>>>>>> >>>>>>> For the second method I get >>>>>>> >>>>>>>> library(GO.db) >>>>>>> Loading required package: AnnotationDbi >>>>>>> Loading required package: Biobase >>>>>>> >>>>>>> Welcome to Bioconductor >>>>>>> >>>>>>> Vignettes contain introductory material. To view, type >>>>>>> 'openVignette()'. To cite Bioconductor, see >>>>>>> 'citation("Biobase")' and for packages 'citation(pkgname)'. >>>>>>> >>>>>>> Loading required package: DBI >>>>>>>> terms <- Term(GOTERM) >>>>>>> Error in function (classes, fdef, mtable) : >>>>>>> unable to find an inherited method for function "Term", for signature >>>>>>> "GOTermsAnnDbBimap" >>>>>>> >>>>>>>> sessionInfo() >>>>>>> R version 2.9.2 Patched (2009-09-05 r49613) >>>>>>> i386-apple-darwin9.8.0 >>>>>>> >>>>>>> locale: >>>>>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>>>> , >>>>>>> attached base packages: >>>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>> >>>>>> Update to R version 2.10 and associated Bioc packages, or for a (much) >>>>>> slower solution (you'll want to check that Term and Ontology return ids >>>>>> in identical order) >>>>>> >>>>>> terms = eapply(GOTERM, Term) >>>>>> >>>>>> etc. I have >>>>>> >>>>>>> sessionInfo() >>>>>> R version 2.10.1 Patched (2010-02-23 r51168) >>>>>> x86_64-unknown-linux-gnu >>>>>> >>>>>> locale: >>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>>> >>>>>> attached base packages: >>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>> >>>>>> other attached packages: >>>>>> [1] GO.db_2.3.5 RSQLite_0.7-3 DBI_0.2-4 >>>>>> [4] AnnotationDbi_1.8.1 Biobase_2.6.1 >>>>>> >>>>>> loaded via a namespace (and not attached): >>>>>> [1] tools_2.10.1 >>>>>> >>>>>> >>>>>> Martin >>>>>> >>>>>>> >>>>>>>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>>>>>>> Date: Sun, 28 Feb 2010 18:42:33 -0800 >>>>>>>> To: Vincent Carey <stvjc at="" channing.harvard.edu=""> >>>>>>>> Cc: Loren Engrav <engrav at="" u.washington.edu="">, >>>>>>>> "bioconductor at stat.math.ethz.ch" >>>>>>>> <bioconductor at="" stat.math.ethz.ch=""> >>>>>>>> Subject: Re: [BioC] GO's to gene's >>>>>>>> >>>>>>>> On 02/28/2010 06:14 PM, Vincent Carey wrote: >>>>>>>>> Perhaps there is a package with such functionality. However, with the >>>>>>>>> GO.db package in place, you need to do a little >>>>>>>>> programming, perhaps along the lines of >>>>>>>>> >>>>>>>>> querGO = function(str, attr = "definition", ont = "MF") { >>>>>>>>> require(GO.db, quietly = TRUE) >>>>>>>>> gc = GO_dbconn() >>>>>>>>> quer.1 = paste("select go_id, term from go_term where", >>>>>>>>> attr, "like('%") >>>>>>>>> quer.2 = "%') and ontology = '" >>>>>>>>> quer.3 = "'" >>>>>>>>> quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "", >>>>>>>>> sep = "") >>>>>>>>> dbGetQuery(gc, quer) >>>>>>>>> } >>>>>>>>> >>>>>>>>> whereby >>>>>>>>> >>>>>>>>>> querGO("collagen", "term") >>>>>>>>> go_id >>>>>>>>> term >>>>>>>>> 1 GO:0004656 procollagen-proline 4-dioxygenase >>>>>>>>> activity >>>>>>>>> 2 GO:0005518 collagen >>>>>>>>> binding >>>>>>>>> 3 GO:0008475 procollagen-lysine 5-dioxygenase >>>>>>>>> activity >>>>>>>>> 4 GO:0019797 procollagen-proline 3-dioxygenase >>>>>>>>> activity >>>>>>>>> 5 GO:0019798 procollagen-proline dioxygenase >>>>>>>>> activity >>>>>>>>> 6 GO:0033823 procollagen glucosyltransferase >>>>>>>>> activity >>>>>>>>> 7 GO:0042329 structural constituent of collagen and cuticulin-based >>>>>>>>> cuticle >>>>>>>>> 8 GO:0050211 procollagen galactosyltransferase >>>>>>>>> activity >>>>>>>>> 9 GO:0070052 collagen V >>>>>>>>> binding >>>>>>>>>> >>>>>>>> >>>>>>>> Also >>>>>>>> >>>>>>>> library(GO.db) >>>>>>>> terms <- Term(GOTERM) # or maybe Definition(GOTERM) ? >>>>>>>> ontologies <- Ontology(GOTERM) >>>>>>>> collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)] >>>>>>>> >>>>>>>> and the next step, >>>>>>>> >>>>>>>> library(org.Hs.eg.db) >>>>>>>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) >>>>>>>> egids <- egids[!is.na(egids)] >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav <engrav at="" u.washington.edu=""> >>>>>>>>> wrote: >>>>>>>>>> Is there a BioC package that will find all the GO terms containing >>>>>>>>>> some >>>>>>>>>> word, like perhaps ?collagen? >>>>>>>>>> And then find all the genes contained within those found terms >>>>>>>>>> >>>>>>>>>> I scanned >>>>>>>>>> GoProfiles >>>>>>>>>> GOSemSim >>>>>>>>>> GOstats >>>>>>>>>> GoTools and >>>>>>>>>> TopGO >>>>>>>>>> >>>>>>>>>> And could not determine that any would do that. >>>>>>>>>> >>>>>>>>>> Thank you. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [[alternative HTML version deleted]] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioconductor mailing list >>>>>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>> Search the archives: >>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioconductor mailing list >>>>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>> Search the archives: >>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Martin Morgan >>>>>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>>>>> 1100 Fairview Ave. N. >>>>>>>> PO Box 19024 Seattle, WA 98109 >>>>>>>> >>>>>>>> Location: Arnold Building M1 B861 >>>>>>>> Phone: (206) 667-2793 >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: >>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>>> >>>>>> -- >>>>>> Martin Morgan >>>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>>> 1100 Fairview Ave. N. >>>>>> PO Box 19024 Seattle, WA 98109 >>>>>> >>>>>> Location: Arnold Building M1 B861 >>>>>> Phone: (206) 667-2793 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> Martin Morgan >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
This is fun, least for me But I am smiling, you want me to write a query? I can barely plagiarize your commands. Ok I can do this You could not find 9697, sobeit. You found 1471 so why do I have it missing? My mistake, it was on the previous printout page. Ok so I try your method for 3570 and 3091. > names (org.Hs.egGO[["3570"]]) [1] "GO:0002384" "GO:0002548" "GO:0002690" "GO:0006953" "GO:0050829" "GO:0042981" "GO:0031018" [8] "GO:0031018" "GO:0032722" "GO:0032755" "GO:0034097" "GO:0042517" "GO:0045669" "GO:0045768" [15] "GO:0048661" "GO:0050731" "GO:0070102" "GO:0070120" "GO:0005886" "GO:0016021" "GO:0005576" [22] "GO:0005896" "GO:0016324" "GO:0005102" "GO:0004872" "GO:0004897" "GO:0004915" "GO:0019899" [29] "GO:0042803" "GO:0070119" Nope, 0032966 not there. I check Amigo and it is there. > names (org.Hs.egGO[["3091"]]) [1] "GO:0001666" "GO:0001755" "GO:0001837" "GO:0001892" "GO:0001938" "GO:0001947" "GO:0002248" [8] "GO:0007165" "GO:0006089" "GO:0006355" "GO:0006879" "GO:0010575" "GO:0010634" "GO:0014850" [15] "GO:0042981" "GO:0046886" "GO:0030154" "GO:0030949" "GO:0032364" "GO:0032722" "GO:0032909" [22] "GO:0032963" "GO:0035162" "GO:0042541" "GO:0042593" "GO:0042789" "GO:0043193" "GO:0043619" [29] "GO:0045648" "GO:0045766" "GO:0045821" "GO:0045926" "GO:0045941" "GO:0045944" "GO:0046716" [36] "GO:0050790" "GO:0051000" "GO:0051216" "GO:0051541" "GO:0005634" "GO:0005737" "GO:0005667" [43] "GO:0005730" "GO:0009434" "GO:0003705" "GO:0004871" "GO:0008134" "GO:0051879" "GO:0035035" [50] "GO:0043565" "GO:0046982" "GO:0046982" Yup, 0032963 is there, so why missed? So > org.Hs.egGO2EG[["GO:0032963"]] IEA IMP "3091" "7148" And I trimmed IEA. But Amigo indicates the evidence is ISS. So we have Two not there One my mistake and One org.Hs.eg.db lists as IEA and Amigo as ISS. I suppose since this question can be answered quite easily with Amigo and they update Amigo assocdb weekly, I should just stick with Amigo for questions like this. But R/BioC is more fun. And once you have the commands in the R.app history, redoing the event is painless, sort of. Again, thank you. > From: Martin Morgan <mtmorgan at="" fhcrc.org=""> > Date: Mon, 01 Mar 2010 19:49:05 -0800 > To: Loren Engrav <engrav at="" u.washington.edu=""> > Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> > Subject: Re: [BioC] GO's to gene's > > On 03/01/2010 06:34 PM, Loren Engrav wrote: >> Thank you >> You are clearly very good at this >> >> So to check it all out I did it manually on Amigo. Amigo found 33 genes >> (limited to Human and omitting IEA) >> >> The org.HS.eg.db method found 29 of the 33 but did not find >> CST3 (1471) GO:0010711 IEP >> HIF1A (3091) GO:0032963 ISS >> IL6R (3570), GO:0032966 IDA and >> TRAM2 (9697) GO:0032964 IMP >> >> I suppose to figure out, for example, why org.Hs.eg.db does not map 9697 to >> GO:0032964 is complex > >> names(org.Hs.egGO[["9697"]]) > [1] "GO:0015031" "GO:0065002" "GO:0016020" "GO:0016021" > > Hmm, what are the offspring / ancestors of GO:0032964 ? > >> GOBPOFFSPRING[["GO:0032964"]] > [1] "GO:0032965" "GO:0032966" "GO:0032967" >> GOBPANCESTOR[["GO:0032964"]] > [1] "all" "GO:0008152" "GO:0008150" "GO:0009058" "GO:0009059" > [6] "GO:0032501" "GO:0032963" "GO:0043170" "GO:0044236" "GO:0044259" > > Nope nothing jumping out. Where's the GO data coming from? > >> org.Hs.eg() ## or GO() > [snip] > Date for GO data: 20090830 > > Whereas AMIGO says (at the bottom of each page) > > GO database release 2010-02-27 > > so that looks like a likely issue that would require some more > substantial investigation. Merits of using a 'current' db (Amigo) vs a > 'versioned' db (GO.db)? See mailing list archives, e.g., current > state-of-knowledge vs. reproducibility (how would we redo the analysis > we did last month and get the same results with AMIGO?). > > On the other hand > >> org.Hs.egGO2EG[["GO:0010711"]] > IEP > "1471" >> GOTERM[["GO:0010711"]] > GOID: GO:0010711 > Term: negative regulation of collagen catabolic process > Ontology: BP > Definition: Any process that decreases the rate, frequency or extent of > collagen catabolism. Collagen catabolism is the proteolytic > chemical reactions and pathways resulting in the breakdown of > collagen in the extracellular matrix. > Synonym: down regulation of collagen catabolic process > Synonym: down-regulation of collagen catabolic process > Synonym: downregulation of collagen catabolic process > Synonym: inhibition of collagen catabolic process > Synonym: negative regulation of collagen breakdown > Synonym: negative regulation of collagen catabolism > Synonym: negative regulation of collagen degradation > > so why didn't we find that one? > >> terms <- Term(GOTERM) # or maybe Definition(GOTERM) >> "GO:0010711" %in% names(terms) > [1] TRUE >> terms[["GO:0010711"]] > [1] "negative regulation of collagen catabolic process" > > yep it's there > >> ontologies <- Ontology(GOTERM) >> ontologies[["GO:0010711"]] > [1] "BP" >> collagen <- terms[grepl("collagen", terms) & ("BP" == ontologies)] >> collagen[["GO:0010711"]] > [1] "negative regulation of collagen catabolic process" > > yep it's there (or were we looking for MF, as below?). > >> egids[["GO:0010711"]] > IEP > "1471" > > yep it's there. So this makes me think it's a programming error or a > miscommunication. I'd suggest you write a little function > > getGO <- > function(termLike, ontology, exludeEvidence) > { > ## a few lines of code here, representing the query you perform > } > > and perhaps sharing that with the list will shed some light. > > Martin > > >> >> Thank you >> >> >>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>> Date: Mon, 01 Mar 2010 05:16:48 -0800 >>> To: Loren Engrav <engrav at="" u.washington.edu=""> >>> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>> Subject: Re: [BioC] GO's to gene's >>> >>> On 02/28/2010 09:01 PM, Loren Engrav wrote: >>>> So I checked >>>>> collagen >>>> And this list matches Amigo >>>> So then would appear the issue lies in >>>>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) >>>> Some of the names are finding no associated genes in org.Hs.egGO2EG and so >>>> appear as NA >>>> True? Possible? >>> >>> yes. GO is not H. sapiens specific and ENTREZ ids are not 100% >>> comprehensive, so some GO terms do not map to ENTREZ ids. >>> >>>>>> Also I would like to omit the IEA group >>> >>> maybe >>> >>> egids <- lapply(egids, function(elt) elt[names(elt) != "IEA"]) >>> egids[sapply(egids, length) != 0] >>> >>> Martin >>> >>>> My version of org.Hs.egGO2EG is 2.3.6 >>>> >>>> >>>> >>>> >>>> >>>>> From: Loren Engrav <engrav at="" u.washington.edu=""> >>>>> Date: Sun, 28 Feb 2010 20:33:05 -0800 >>>>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>>> Conversation: [BioC] GO's to gene's >>>>> Subject: Re: [BioC] GO's to gene's >>>>> >>>>> Oopps, Amigo says there are 20 such terms, not 68 as I said before, cuz I >>>>> retrieved only BP >>>>> >>>>> >>>>>> From: Loren Engrav <engrav at="" u.washington.edu=""> >>>>>> Date: Sun, 28 Feb 2010 20:28:17 -0800 >>>>>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>>>> Conversation: [BioC] GO's to gene's >>>>>> Subject: Re: [BioC] GO's to gene's >>>>>> >>>>>> Ok thank you >>>>>> I now show >>>>>>> sessionInfo() >>>>>> R version 2.10.1 (2009-12-14) >>>>>> i386-apple-darwin9.8.0 >>>>>> >>>>>> locale: >>>>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>>>> >>>>>> attached base packages: >>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>> >>>>>> other attached packages: >>>>>> [1] org.Hs.eg.db_2.3.6 GO.db_2.3.5 RSQLite_0.8-3 >>>>>> AnnotationDbi_1.8.1 DBI_0.2-5 >>>>>> [6] Biobase_2.6.1 >>>>>> >>>>>> loaded via a namespace (and not attached): >>>>>> [1] tools_2.10.1 >>>>>> >>>>>> And all commands pass with no errors, however I see >>>>>> >>>>>>> egids >>>>>> $`GO:0010711` >>>>>> IEP >>>>>> "1471" >>>>>> >>>>>> $`GO:0030199` >>>>>> IEA IEA ISS IEA IMP IMP IMP IMP NAS >>>>>> IMP NAS IMP ISS >>>>>> "302" "304" "538" "871" "1277" "1278" "1280" "1281" "1281" >>>>>> "1289" "1289" "1290" "1290" >>>>>> NAS IDA NAS IEA IEA IEA IEA IEA NAS >>>>>> ISS IDA ISS NAS >>>>>> "1301" "1302" "1303" "1805" "2296" "2303" "4010" "4015" "4060" >>>>>> "4763" "7042" "7046" "7373" >>>>>> NAS NAS >>>>>> "9508" "50509" >>>>>> >>>>>> $`GO:0030574` >>>>>> IEA IEA IEA IEA IEA IEA IEA IEA >>>>>> IEA IEA IEA >>>>>> "4312" "4313" "4314" "4316" "4317" "4318" "4319" "4320" >>>>>> "4322" "4325" "4327" >>>>>> IEA IDA IMP NAS IEA NAS IEA IEA >>>>>> IEA IEA >>>>>> "5184" "5645" "5645" "5653" "5657" "9508" "9509" "56547" >>>>>> "64066" "140766" >>>>>> >>>>>> $`GO:0032963` >>>>>> IEA IMP >>>>>> "3091" "7148" >>>>>> >>>>>> $`GO:0032964` >>>>>> IEA IMP IMP TAS IMP >>>>>> "871" "1277" "1281" "1281" "1289" >>>>>> >>>>>> $`GO:0032966` >>>>>> IDA IC >>>>>> "3569" "4261" >>>>>> >>>>>> $`GO:0032967` >>>>>> ISS IDA IDA IC IMP TAS IMP >>>>>> "265" "2147" "2149" "3066" "7040" "7040" "7043" >>>>>> >>>>>> $`GO:0033342` >>>>>> IMP >>>>>> "23560" >>>>>> >>>>>> So many GO terms containing the word "collagen" are not listed, like >>>>>> 0004656 >>>>>> 0005518 >>>>>> etc >>>>>> Amigo claims there are 68 such terms and the list above has only 8 >>>>>> What did I do wrong? >>>>>> Also I would like to omit the IEA group >>>>>> >>>>>> Thank you >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>>>>>> Date: Sun, 28 Feb 2010 19:30:34 -0800 >>>>>>> To: Loren Engrav <engrav at="" u.washington.edu=""> >>>>>>> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>>>>> Subject: Re: [BioC] GO's to gene's >>>>>>> >>>>>>> On 02/28/2010 07:17 PM, Loren Engrav wrote: >>>>>>>> Thank you both >>>>>>>> Given my skills, it might be easier/quicker to do it "manually" with >>>>>>>> Amigo >>>>>>>> But I am trying both methods >>>>>>>> >>>>>>>> For the second method I get >>>>>>>> >>>>>>>>> library(GO.db) >>>>>>>> Loading required package: AnnotationDbi >>>>>>>> Loading required package: Biobase >>>>>>>> >>>>>>>> Welcome to Bioconductor >>>>>>>> >>>>>>>> Vignettes contain introductory material. To view, type >>>>>>>> 'openVignette()'. To cite Bioconductor, see >>>>>>>> 'citation("Biobase")' and for packages 'citation(pkgname)'. >>>>>>>> >>>>>>>> Loading required package: DBI >>>>>>>>> terms <- Term(GOTERM) >>>>>>>> Error in function (classes, fdef, mtable) : >>>>>>>> unable to find an inherited method for function "Term", for signature >>>>>>>> "GOTermsAnnDbBimap" >>>>>>>> >>>>>>>>> sessionInfo() >>>>>>>> R version 2.9.2 Patched (2009-09-05 r49613) >>>>>>>> i386-apple-darwin9.8.0 >>>>>>>> >>>>>>>> locale: >>>>>>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>>>>> , >>>>>>>> attached base packages: >>>>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>>> >>>>>>> Update to R version 2.10 and associated Bioc packages, or for a (much) >>>>>>> slower solution (you'll want to check that Term and Ontology return ids >>>>>>> in identical order) >>>>>>> >>>>>>> terms = eapply(GOTERM, Term) >>>>>>> >>>>>>> etc. I have >>>>>>> >>>>>>>> sessionInfo() >>>>>>> R version 2.10.1 Patched (2010-02-23 r51168) >>>>>>> x86_64-unknown-linux-gnu >>>>>>> >>>>>>> locale: >>>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>>>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>>>> >>>>>>> attached base packages: >>>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>>> >>>>>>> other attached packages: >>>>>>> [1] GO.db_2.3.5 RSQLite_0.7-3 DBI_0.2-4 >>>>>>> [4] AnnotationDbi_1.8.1 Biobase_2.6.1 >>>>>>> >>>>>>> loaded via a namespace (and not attached): >>>>>>> [1] tools_2.10.1 >>>>>>> >>>>>>> >>>>>>> Martin >>>>>>> >>>>>>>> >>>>>>>>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>>>>>>>> Date: Sun, 28 Feb 2010 18:42:33 -0800 >>>>>>>>> To: Vincent Carey <stvjc at="" channing.harvard.edu=""> >>>>>>>>> Cc: Loren Engrav <engrav at="" u.washington.edu="">, >>>>>>>>> "bioconductor at stat.math.ethz.ch" >>>>>>>>> <bioconductor at="" stat.math.ethz.ch=""> >>>>>>>>> Subject: Re: [BioC] GO's to gene's >>>>>>>>> >>>>>>>>> On 02/28/2010 06:14 PM, Vincent Carey wrote: >>>>>>>>>> Perhaps there is a package with such functionality. However, with >>>>>>>>>> the >>>>>>>>>> GO.db package in place, you need to do a little >>>>>>>>>> programming, perhaps along the lines of >>>>>>>>>> >>>>>>>>>> querGO = function(str, attr = "definition", ont = "MF") { >>>>>>>>>> require(GO.db, quietly = TRUE) >>>>>>>>>> gc = GO_dbconn() >>>>>>>>>> quer.1 = paste("select go_id, term from go_term where", >>>>>>>>>> attr, "like('%") >>>>>>>>>> quer.2 = "%') and ontology = '" >>>>>>>>>> quer.3 = "'" >>>>>>>>>> quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "", >>>>>>>>>> sep = "") >>>>>>>>>> dbGetQuery(gc, quer) >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> whereby >>>>>>>>>> >>>>>>>>>>> querGO("collagen", "term") >>>>>>>>>> go_id >>>>>>>>>> term >>>>>>>>>> 1 GO:0004656 procollagen-proline 4-dioxygenase >>>>>>>>>> activity >>>>>>>>>> 2 GO:0005518 collagen >>>>>>>>>> binding >>>>>>>>>> 3 GO:0008475 procollagen-lysine 5-dioxygenase >>>>>>>>>> activity >>>>>>>>>> 4 GO:0019797 procollagen-proline 3-dioxygenase >>>>>>>>>> activity >>>>>>>>>> 5 GO:0019798 procollagen-proline dioxygenase >>>>>>>>>> activity >>>>>>>>>> 6 GO:0033823 procollagen glucosyltransferase >>>>>>>>>> activity >>>>>>>>>> 7 GO:0042329 structural constituent of collagen and cuticulin-based >>>>>>>>>> cuticle >>>>>>>>>> 8 GO:0050211 procollagen galactosyltransferase >>>>>>>>>> activity >>>>>>>>>> 9 GO:0070052 collagen V >>>>>>>>>> binding >>>>>>>>>>> >>>>>>>>> >>>>>>>>> Also >>>>>>>>> >>>>>>>>> library(GO.db) >>>>>>>>> terms <- Term(GOTERM) # or maybe Definition(GOTERM) ? >>>>>>>>> ontologies <- Ontology(GOTERM) >>>>>>>>> collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)] >>>>>>>>> >>>>>>>>> and the next step, >>>>>>>>> >>>>>>>>> library(org.Hs.eg.db) >>>>>>>>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) >>>>>>>>> egids <- egids[!is.na(egids)] >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav >>>>>>>>>> <engrav at="" u.washington.edu=""> >>>>>>>>>> wrote: >>>>>>>>>>> Is there a BioC package that will find all the GO terms containing >>>>>>>>>>> some >>>>>>>>>>> word, like perhaps ?collagen? >>>>>>>>>>> And then find all the genes contained within those found terms >>>>>>>>>>> >>>>>>>>>>> I scanned >>>>>>>>>>> GoProfiles >>>>>>>>>>> GOSemSim >>>>>>>>>>> GOstats >>>>>>>>>>> GoTools and >>>>>>>>>>> TopGO >>>>>>>>>>> >>>>>>>>>>> And could not determine that any would do that. >>>>>>>>>>> >>>>>>>>>>> Thank you. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [[alternative HTML version deleted]] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioconductor mailing list >>>>>>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>>> Search the archives: >>>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioconductor mailing list >>>>>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>> Search the archives: >>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Martin Morgan >>>>>>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>>>>>> 1100 Fairview Ave. N. >>>>>>>>> PO Box 19024 Seattle, WA 98109 >>>>>>>>> >>>>>>>>> Location: Arnold Building M1 B861 >>>>>>>>> Phone: (206) 667-2793 >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioconductor mailing list >>>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>> Search the archives: >>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Martin Morgan >>>>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>>>> 1100 Fairview Ave. N. >>>>>>> PO Box 19024 Seattle, WA 98109 >>>>>>> >>>>>>> Location: Arnold Building M1 B861 >>>>>>> Phone: (206) 667-2793 >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >>> -- >>> Martin Morgan >>> Computational Biology / Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N. >>> PO Box 19024 Seattle, WA 98109 >>> >>> Location: Arnold Building M1 B861 >>> Phone: (206) 667-2793 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
On 03/01/2010 08:54 PM, Loren Engrav wrote: > This is fun, least for me yes it is fun... > But I am smiling, you want me to write a query? I can barely plagiarize your > commands. > > Ok I can do this > > You could not find 9697, sobeit. > You found 1471 so why do I have it missing? My mistake, it was on the > previous printout page. > > Ok so I try your method for 3570 and 3091. > >> names (org.Hs.egGO[["3570"]]) > [1] "GO:0002384" "GO:0002548" "GO:0002690" "GO:0006953" "GO:0050829" > "GO:0042981" "GO:0031018" > [8] "GO:0031018" "GO:0032722" "GO:0032755" "GO:0034097" "GO:0042517" > "GO:0045669" "GO:0045768" > [15] "GO:0048661" "GO:0050731" "GO:0070102" "GO:0070120" "GO:0005886" > "GO:0016021" "GO:0005576" > [22] "GO:0005896" "GO:0016324" "GO:0005102" "GO:0004872" "GO:0004897" > "GO:0004915" "GO:0019899" > [29] "GO:0042803" "GO:0070119" > > Nope, 0032966 not there. I check Amigo and it is there. > >> names (org.Hs.egGO[["3091"]]) > [1] "GO:0001666" "GO:0001755" "GO:0001837" "GO:0001892" "GO:0001938" > "GO:0001947" "GO:0002248" > [8] "GO:0007165" "GO:0006089" "GO:0006355" "GO:0006879" "GO:0010575" > "GO:0010634" "GO:0014850" > [15] "GO:0042981" "GO:0046886" "GO:0030154" "GO:0030949" "GO:0032364" > "GO:0032722" "GO:0032909" > [22] "GO:0032963" "GO:0035162" "GO:0042541" "GO:0042593" "GO:0042789" > "GO:0043193" "GO:0043619" > [29] "GO:0045648" "GO:0045766" "GO:0045821" "GO:0045926" "GO:0045941" > "GO:0045944" "GO:0046716" > [36] "GO:0050790" "GO:0051000" "GO:0051216" "GO:0051541" "GO:0005634" > "GO:0005737" "GO:0005667" > [43] "GO:0005730" "GO:0009434" "GO:0003705" "GO:0004871" "GO:0008134" > "GO:0051879" "GO:0035035" > [50] "GO:0043565" "GO:0046982" "GO:0046982" > > Yup, 0032963 is there, so why missed? So >> org.Hs.egGO2EG[["GO:0032963"]] > IEA IMP > "3091" "7148" > And I trimmed IEA. But Amigo indicates the evidence is ISS. > > So we have > Two not there > One my mistake and > One org.Hs.eg.db lists as IEA and Amigo as ISS. > > I suppose since this question can be answered quite easily with Amigo and > they update Amigo assocdb weekly, I should just stick with Amigo for > questions like this. ... I wouldn't want my results to change every week (Sean said every month?) so I'll stick with Bioconductor ;) Maybe there is a way in Amigo to go back in time to the same date as the Bioc package? (there is a time machine in Bioconductor, just use the previous release(s) of R / Bioc). > But R/BioC is more fun. And once you have the commands > in the R.app history, redoing the event is painless, sort of. Capture your well-trod path into a function, and 'source' it when you need to; it really will be painless. Here's my (untested) understanding of what you're after, taken from the various pieces in the email... getGo <- function(likeTerm, inOntology, excludeEvidence) { require(GO.db) require(org.Hs.eg.db) terms <- Term(GOTERM) ontologies <- Ontology(GOTERM) idx <- grepl(likeTerm, terms) & (inOntology == ontologies) myterms <- terms[idx] egids <- mget(names(myterms), org.Hs.egGO2EG, ifnotfound=NA) egids <- lapply(egids, function(elt) { ok <- !(names(elt) %in% excludeEvidence) elt[ok] }) egids[!sapply(egids, length) == 0] } Using the suggestion from Vince to make direct SQL quries might make this _really_ fun. Martin > > Again, thank you. > > >> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >> Date: Mon, 01 Mar 2010 19:49:05 -0800 >> To: Loren Engrav <engrav at="" u.washington.edu=""> >> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >> Subject: Re: [BioC] GO's to gene's >> >> On 03/01/2010 06:34 PM, Loren Engrav wrote: >>> Thank you >>> You are clearly very good at this >>> >>> So to check it all out I did it manually on Amigo. Amigo found 33 genes >>> (limited to Human and omitting IEA) >>> >>> The org.HS.eg.db method found 29 of the 33 but did not find >>> CST3 (1471) GO:0010711 IEP >>> HIF1A (3091) GO:0032963 ISS >>> IL6R (3570), GO:0032966 IDA and >>> TRAM2 (9697) GO:0032964 IMP >>> >>> I suppose to figure out, for example, why org.Hs.eg.db does not map 9697 to >>> GO:0032964 is complex >> >>> names(org.Hs.egGO[["9697"]]) >> [1] "GO:0015031" "GO:0065002" "GO:0016020" "GO:0016021" >> >> Hmm, what are the offspring / ancestors of GO:0032964 ? >> >>> GOBPOFFSPRING[["GO:0032964"]] >> [1] "GO:0032965" "GO:0032966" "GO:0032967" >>> GOBPANCESTOR[["GO:0032964"]] >> [1] "all" "GO:0008152" "GO:0008150" "GO:0009058" "GO:0009059" >> [6] "GO:0032501" "GO:0032963" "GO:0043170" "GO:0044236" "GO:0044259" >> >> Nope nothing jumping out. Where's the GO data coming from? >> >>> org.Hs.eg() ## or GO() >> [snip] >> Date for GO data: 20090830 >> >> Whereas AMIGO says (at the bottom of each page) >> >> GO database release 2010-02-27 >> >> so that looks like a likely issue that would require some more >> substantial investigation. Merits of using a 'current' db (Amigo) vs a >> 'versioned' db (GO.db)? See mailing list archives, e.g., current >> state-of-knowledge vs. reproducibility (how would we redo the analysis >> we did last month and get the same results with AMIGO?). >> >> On the other hand >> >>> org.Hs.egGO2EG[["GO:0010711"]] >> IEP >> "1471" >>> GOTERM[["GO:0010711"]] >> GOID: GO:0010711 >> Term: negative regulation of collagen catabolic process >> Ontology: BP >> Definition: Any process that decreases the rate, frequency or extent of >> collagen catabolism. Collagen catabolism is the proteolytic >> chemical reactions and pathways resulting in the breakdown of >> collagen in the extracellular matrix. >> Synonym: down regulation of collagen catabolic process >> Synonym: down-regulation of collagen catabolic process >> Synonym: downregulation of collagen catabolic process >> Synonym: inhibition of collagen catabolic process >> Synonym: negative regulation of collagen breakdown >> Synonym: negative regulation of collagen catabolism >> Synonym: negative regulation of collagen degradation >> >> so why didn't we find that one? >> >>> terms <- Term(GOTERM) # or maybe Definition(GOTERM) >>> "GO:0010711" %in% names(terms) >> [1] TRUE >>> terms[["GO:0010711"]] >> [1] "negative regulation of collagen catabolic process" >> >> yep it's there >> >>> ontologies <- Ontology(GOTERM) >>> ontologies[["GO:0010711"]] >> [1] "BP" >>> collagen <- terms[grepl("collagen", terms) & ("BP" == ontologies)] >>> collagen[["GO:0010711"]] >> [1] "negative regulation of collagen catabolic process" >> >> yep it's there (or were we looking for MF, as below?). >> >>> egids[["GO:0010711"]] >> IEP >> "1471" >> >> yep it's there. So this makes me think it's a programming error or a >> miscommunication. I'd suggest you write a little function >> >> getGO <- >> function(termLike, ontology, exludeEvidence) >> { >> ## a few lines of code here, representing the query you perform >> } >> >> and perhaps sharing that with the list will shed some light. >> >> Martin >> >> >>> >>> Thank you >>> >>> >>>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>>> Date: Mon, 01 Mar 2010 05:16:48 -0800 >>>> To: Loren Engrav <engrav at="" u.washington.edu=""> >>>> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>> Subject: Re: [BioC] GO's to gene's >>>> >>>> On 02/28/2010 09:01 PM, Loren Engrav wrote: >>>>> So I checked >>>>>> collagen >>>>> And this list matches Amigo >>>>> So then would appear the issue lies in >>>>>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) >>>>> Some of the names are finding no associated genes in org.Hs.egGO2EG and so >>>>> appear as NA >>>>> True? Possible? >>>> >>>> yes. GO is not H. sapiens specific and ENTREZ ids are not 100% >>>> comprehensive, so some GO terms do not map to ENTREZ ids. >>>> >>>>>>> Also I would like to omit the IEA group >>>> >>>> maybe >>>> >>>> egids <- lapply(egids, function(elt) elt[names(elt) != "IEA"]) >>>> egids[sapply(egids, length) != 0] >>>> >>>> Martin >>>> >>>>> My version of org.Hs.egGO2EG is 2.3.6 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> From: Loren Engrav <engrav at="" u.washington.edu=""> >>>>>> Date: Sun, 28 Feb 2010 20:33:05 -0800 >>>>>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>>>> Conversation: [BioC] GO's to gene's >>>>>> Subject: Re: [BioC] GO's to gene's >>>>>> >>>>>> Oopps, Amigo says there are 20 such terms, not 68 as I said before, cuz I >>>>>> retrieved only BP >>>>>> >>>>>> >>>>>>> From: Loren Engrav <engrav at="" u.washington.edu=""> >>>>>>> Date: Sun, 28 Feb 2010 20:28:17 -0800 >>>>>>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>>>>> Conversation: [BioC] GO's to gene's >>>>>>> Subject: Re: [BioC] GO's to gene's >>>>>>> >>>>>>> Ok thank you >>>>>>> I now show >>>>>>>> sessionInfo() >>>>>>> R version 2.10.1 (2009-12-14) >>>>>>> i386-apple-darwin9.8.0 >>>>>>> >>>>>>> locale: >>>>>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>>>>> >>>>>>> attached base packages: >>>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>>> >>>>>>> other attached packages: >>>>>>> [1] org.Hs.eg.db_2.3.6 GO.db_2.3.5 RSQLite_0.8-3 >>>>>>> AnnotationDbi_1.8.1 DBI_0.2-5 >>>>>>> [6] Biobase_2.6.1 >>>>>>> >>>>>>> loaded via a namespace (and not attached): >>>>>>> [1] tools_2.10.1 >>>>>>> >>>>>>> And all commands pass with no errors, however I see >>>>>>> >>>>>>>> egids >>>>>>> $`GO:0010711` >>>>>>> IEP >>>>>>> "1471" >>>>>>> >>>>>>> $`GO:0030199` >>>>>>> IEA IEA ISS IEA IMP IMP IMP IMP NAS >>>>>>> IMP NAS IMP ISS >>>>>>> "302" "304" "538" "871" "1277" "1278" "1280" "1281" "1281" >>>>>>> "1289" "1289" "1290" "1290" >>>>>>> NAS IDA NAS IEA IEA IEA IEA IEA NAS >>>>>>> ISS IDA ISS NAS >>>>>>> "1301" "1302" "1303" "1805" "2296" "2303" "4010" "4015" "4060" >>>>>>> "4763" "7042" "7046" "7373" >>>>>>> NAS NAS >>>>>>> "9508" "50509" >>>>>>> >>>>>>> $`GO:0030574` >>>>>>> IEA IEA IEA IEA IEA IEA IEA IEA >>>>>>> IEA IEA IEA >>>>>>> "4312" "4313" "4314" "4316" "4317" "4318" "4319" "4320" >>>>>>> "4322" "4325" "4327" >>>>>>> IEA IDA IMP NAS IEA NAS IEA IEA >>>>>>> IEA IEA >>>>>>> "5184" "5645" "5645" "5653" "5657" "9508" "9509" "56547" >>>>>>> "64066" "140766" >>>>>>> >>>>>>> $`GO:0032963` >>>>>>> IEA IMP >>>>>>> "3091" "7148" >>>>>>> >>>>>>> $`GO:0032964` >>>>>>> IEA IMP IMP TAS IMP >>>>>>> "871" "1277" "1281" "1281" "1289" >>>>>>> >>>>>>> $`GO:0032966` >>>>>>> IDA IC >>>>>>> "3569" "4261" >>>>>>> >>>>>>> $`GO:0032967` >>>>>>> ISS IDA IDA IC IMP TAS IMP >>>>>>> "265" "2147" "2149" "3066" "7040" "7040" "7043" >>>>>>> >>>>>>> $`GO:0033342` >>>>>>> IMP >>>>>>> "23560" >>>>>>> >>>>>>> So many GO terms containing the word "collagen" are not listed, like >>>>>>> 0004656 >>>>>>> 0005518 >>>>>>> etc >>>>>>> Amigo claims there are 68 such terms and the list above has only 8 >>>>>>> What did I do wrong? >>>>>>> Also I would like to omit the IEA group >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>>>>>>> Date: Sun, 28 Feb 2010 19:30:34 -0800 >>>>>>>> To: Loren Engrav <engrav at="" u.washington.edu=""> >>>>>>>> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>>>>>> Subject: Re: [BioC] GO's to gene's >>>>>>>> >>>>>>>> On 02/28/2010 07:17 PM, Loren Engrav wrote: >>>>>>>>> Thank you both >>>>>>>>> Given my skills, it might be easier/quicker to do it "manually" with >>>>>>>>> Amigo >>>>>>>>> But I am trying both methods >>>>>>>>> >>>>>>>>> For the second method I get >>>>>>>>> >>>>>>>>>> library(GO.db) >>>>>>>>> Loading required package: AnnotationDbi >>>>>>>>> Loading required package: Biobase >>>>>>>>> >>>>>>>>> Welcome to Bioconductor >>>>>>>>> >>>>>>>>> Vignettes contain introductory material. To view, type >>>>>>>>> 'openVignette()'. To cite Bioconductor, see >>>>>>>>> 'citation("Biobase")' and for packages 'citation(pkgname)'. >>>>>>>>> >>>>>>>>> Loading required package: DBI >>>>>>>>>> terms <- Term(GOTERM) >>>>>>>>> Error in function (classes, fdef, mtable) : >>>>>>>>> unable to find an inherited method for function "Term", for signature >>>>>>>>> "GOTermsAnnDbBimap" >>>>>>>>> >>>>>>>>>> sessionInfo() >>>>>>>>> R version 2.9.2 Patched (2009-09-05 r49613) >>>>>>>>> i386-apple-darwin9.8.0 >>>>>>>>> >>>>>>>>> locale: >>>>>>>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>>>>>> , >>>>>>>>> attached base packages: >>>>>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>>>> >>>>>>>> Update to R version 2.10 and associated Bioc packages, or for a (much) >>>>>>>> slower solution (you'll want to check that Term and Ontology return ids >>>>>>>> in identical order) >>>>>>>> >>>>>>>> terms = eapply(GOTERM, Term) >>>>>>>> >>>>>>>> etc. I have >>>>>>>> >>>>>>>>> sessionInfo() >>>>>>>> R version 2.10.1 Patched (2010-02-23 r51168) >>>>>>>> x86_64-unknown-linux-gnu >>>>>>>> >>>>>>>> locale: >>>>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>>>>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>>>>> >>>>>>>> attached base packages: >>>>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>>>> >>>>>>>> other attached packages: >>>>>>>> [1] GO.db_2.3.5 RSQLite_0.7-3 DBI_0.2-4 >>>>>>>> [4] AnnotationDbi_1.8.1 Biobase_2.6.1 >>>>>>>> >>>>>>>> loaded via a namespace (and not attached): >>>>>>>> [1] tools_2.10.1 >>>>>>>> >>>>>>>> >>>>>>>> Martin >>>>>>>> >>>>>>>>> >>>>>>>>>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>>>>>>>>> Date: Sun, 28 Feb 2010 18:42:33 -0800 >>>>>>>>>> To: Vincent Carey <stvjc at="" channing.harvard.edu=""> >>>>>>>>>> Cc: Loren Engrav <engrav at="" u.washington.edu="">, >>>>>>>>>> "bioconductor at stat.math.ethz.ch" >>>>>>>>>> <bioconductor at="" stat.math.ethz.ch=""> >>>>>>>>>> Subject: Re: [BioC] GO's to gene's >>>>>>>>>> >>>>>>>>>> On 02/28/2010 06:14 PM, Vincent Carey wrote: >>>>>>>>>>> Perhaps there is a package with such functionality. However, with >>>>>>>>>>> the >>>>>>>>>>> GO.db package in place, you need to do a little >>>>>>>>>>> programming, perhaps along the lines of >>>>>>>>>>> >>>>>>>>>>> querGO = function(str, attr = "definition", ont = "MF") { >>>>>>>>>>> require(GO.db, quietly = TRUE) >>>>>>>>>>> gc = GO_dbconn() >>>>>>>>>>> quer.1 = paste("select go_id, term from go_term where", >>>>>>>>>>> attr, "like('%") >>>>>>>>>>> quer.2 = "%') and ontology = '" >>>>>>>>>>> quer.3 = "'" >>>>>>>>>>> quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "", >>>>>>>>>>> sep = "") >>>>>>>>>>> dbGetQuery(gc, quer) >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> whereby >>>>>>>>>>> >>>>>>>>>>>> querGO("collagen", "term") >>>>>>>>>>> go_id >>>>>>>>>>> term >>>>>>>>>>> 1 GO:0004656 procollagen-proline 4-dioxygenase >>>>>>>>>>> activity >>>>>>>>>>> 2 GO:0005518 collagen >>>>>>>>>>> binding >>>>>>>>>>> 3 GO:0008475 procollagen-lysine 5-dioxygenase >>>>>>>>>>> activity >>>>>>>>>>> 4 GO:0019797 procollagen-proline 3-dioxygenase >>>>>>>>>>> activity >>>>>>>>>>> 5 GO:0019798 procollagen-proline dioxygenase >>>>>>>>>>> activity >>>>>>>>>>> 6 GO:0033823 procollagen glucosyltransferase >>>>>>>>>>> activity >>>>>>>>>>> 7 GO:0042329 structural constituent of collagen and cuticulin-based >>>>>>>>>>> cuticle >>>>>>>>>>> 8 GO:0050211 procollagen galactosyltransferase >>>>>>>>>>> activity >>>>>>>>>>> 9 GO:0070052 collagen V >>>>>>>>>>> binding >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Also >>>>>>>>>> >>>>>>>>>> library(GO.db) >>>>>>>>>> terms <- Term(GOTERM) # or maybe Definition(GOTERM) ? >>>>>>>>>> ontologies <- Ontology(GOTERM) >>>>>>>>>> collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)] >>>>>>>>>> >>>>>>>>>> and the next step, >>>>>>>>>> >>>>>>>>>> library(org.Hs.eg.db) >>>>>>>>>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) >>>>>>>>>> egids <- egids[!is.na(egids)] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav >>>>>>>>>>> <engrav at="" u.washington.edu=""> >>>>>>>>>>> wrote: >>>>>>>>>>>> Is there a BioC package that will find all the GO terms containing >>>>>>>>>>>> some >>>>>>>>>>>> word, like perhaps ?collagen? >>>>>>>>>>>> And then find all the genes contained within those found terms >>>>>>>>>>>> >>>>>>>>>>>> I scanned >>>>>>>>>>>> GoProfiles >>>>>>>>>>>> GOSemSim >>>>>>>>>>>> GOstats >>>>>>>>>>>> GoTools and >>>>>>>>>>>> TopGO >>>>>>>>>>>> >>>>>>>>>>>> And could not determine that any would do that. >>>>>>>>>>>> >>>>>>>>>>>> Thank you. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [[alternative HTML version deleted]] >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Bioconductor mailing list >>>>>>>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>>>> Search the archives: >>>>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioconductor mailing list >>>>>>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>>> Search the archives: >>>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Martin Morgan >>>>>>>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>>>>>>> 1100 Fairview Ave. N. >>>>>>>>>> PO Box 19024 Seattle, WA 98109 >>>>>>>>>> >>>>>>>>>> Location: Arnold Building M1 B861 >>>>>>>>>> Phone: (206) 667-2793 >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioconductor mailing list >>>>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>> Search the archives: >>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Martin Morgan >>>>>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>>>>> 1100 Fairview Ave. N. >>>>>>>> PO Box 19024 Seattle, WA 98109 >>>>>>>> >>>>>>>> Location: Arnold Building M1 B861 >>>>>>>> Phone: (206) 667-2793 >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>>> -- >>>> Martin Morgan >>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N. >>>> PO Box 19024 Seattle, WA 98109 >>>> >>>> Location: Arnold Building M1 B861 >>>> Phone: (206) 667-2793 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> Martin Morgan >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY
0
Entering edit mode
On Mon, Mar 1, 2010 at 9:34 PM, Loren Engrav <engrav at="" u.washington.edu=""> wrote: > Thank you > You are clearly very good at this > > So to check it all out I did it manually on Amigo. Amigo found 33 genes > (limited to Human and omitting IEA) > > The org.HS.eg.db method found 29 of the 33 but did not find > CST3 (1471) GO:0010711 IEP > HIF1A (3091) GO:0032963 ISS > IL6R (3570), GO:0032966 IDA and > TRAM2 (9697) GO:0032964 IMP > > I suppose to figure out, for example, why org.Hs.eg.db does not map 9697 to > GO:0032964 is complex Amigo is updated each month, I believe. The org.Hs.eg.db package is released only every 6 months (last release, October, 2009), so there might be some justifiable differences between the two sources. Sean > >> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >> Date: Mon, 01 Mar 2010 05:16:48 -0800 >> To: Loren Engrav <engrav at="" u.washington.edu=""> >> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >> Subject: Re: [BioC] GO's to gene's >> >> On 02/28/2010 09:01 PM, Loren Engrav wrote: >>> So I checked >>>> collagen >>> And this list matches Amigo >>> So then would appear the issue lies in >>>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) >>> Some of the names are finding no associated genes in org.Hs.egGO2EG and so >>> appear as NA >>> True? Possible? >> >> yes. GO is not H. sapiens specific and ENTREZ ids are not 100% >> comprehensive, so some GO terms do not map to ENTREZ ids. >> >>>>> Also I would like to omit the IEA group >> >> maybe >> >> ? egids <- lapply(egids, function(elt) ?elt[names(elt) != "IEA"]) >> ? egids[sapply(egids, length) != 0] >> >> Martin >> >>> My version of org.Hs.egGO2EG is 2.3.6 >>> >>> >>> >>> >>> >>>> From: Loren Engrav <engrav at="" u.washington.edu=""> >>>> Date: Sun, 28 Feb 2010 20:33:05 -0800 >>>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>> Conversation: [BioC] GO's to gene's >>>> Subject: Re: [BioC] GO's to gene's >>>> >>>> Oopps, Amigo says there are 20 such terms, not 68 as I said before, cuz I >>>> retrieved only BP >>>> >>>> >>>>> From: Loren Engrav <engrav at="" u.washington.edu=""> >>>>> Date: Sun, 28 Feb 2010 20:28:17 -0800 >>>>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>>> Conversation: [BioC] GO's to gene's >>>>> Subject: Re: [BioC] GO's to gene's >>>>> >>>>> Ok thank you >>>>> I now show >>>>>> sessionInfo() >>>>> R version 2.10.1 (2009-12-14) >>>>> i386-apple-darwin9.8.0 >>>>> >>>>> locale: >>>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>>> >>>>> attached base packages: >>>>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >>>>> >>>>> other attached packages: >>>>> [1] org.Hs.eg.db_2.3.6 ?GO.db_2.3.5 ? ? ? ? RSQLite_0.8-3 >>>>> AnnotationDbi_1.8.1 DBI_0.2-5 >>>>> [6] Biobase_2.6.1 >>>>> >>>>> loaded via a namespace (and not attached): >>>>> [1] tools_2.10.1 >>>>> >>>>> And all commands pass with no errors, however I see >>>>> >>>>>> egids >>>>> $`GO:0010711` >>>>> ? ?IEP >>>>> "1471" >>>>> >>>>> $`GO:0030199` >>>>> ? ? IEA ? ? IEA ? ? ISS ? ? IEA ? ? IMP ? ? IMP ? ? IMP ? ? IMP ? ? NAS >>>>> IMP ? ? NAS ? ? IMP ? ? ISS >>>>> ? "302" ? "304" ? "538" ? "871" ?"1277" ?"1278" ?"1280" ?"1281" ?"1281" >>>>> "1289" ?"1289" ?"1290" ?"1290" >>>>> ? ? NAS ? ? IDA ? ? NAS ? ? IEA ? ? IEA ? ? IEA ? ? IEA ? ? IEA ? ? NAS >>>>> ISS ? ? IDA ? ? ISS ? ? NAS >>>>> ?"1301" ?"1302" ?"1303" ?"1805" ?"2296" ?"2303" ?"4010" ?"4015" ?"4060" >>>>> "4763" ?"7042" ?"7046" ?"7373" >>>>> ? ? NAS ? ? NAS >>>>> ?"9508" "50509" >>>>> >>>>> $`GO:0030574` >>>>> ? ? ?IEA ? ? ?IEA ? ? ?IEA ? ? ?IEA ? ? ?IEA ? ? ?IEA ? ? ?IEA ? ? ?IEA >>>>> IEA ? ? ?IEA ? ? ?IEA >>>>> ? "4312" ? "4313" ? "4314" ? "4316" ? "4317" ? "4318" ? "4319" ? "4320" >>>>> "4322" ? "4325" ? "4327" >>>>> ? ? ?IEA ? ? ?IDA ? ? ?IMP ? ? ?NAS ? ? ?IEA ? ? ?NAS ? ? ?IEA ? ? ?IEA >>>>> IEA ? ? ?IEA >>>>> ? "5184" ? "5645" ? "5645" ? "5653" ? "5657" ? "9508" ? "9509" ?"56547" >>>>> "64066" "140766" >>>>> >>>>> $`GO:0032963` >>>>> ? ?IEA ? ?IMP >>>>> "3091" "7148" >>>>> >>>>> $`GO:0032964` >>>>> ? ?IEA ? ?IMP ? ?IMP ? ?TAS ? ?IMP >>>>> ?"871" "1277" "1281" "1281" "1289" >>>>> >>>>> $`GO:0032966` >>>>> ? ?IDA ? ? IC >>>>> "3569" "4261" >>>>> >>>>> $`GO:0032967` >>>>> ? ?ISS ? ?IDA ? ?IDA ? ? IC ? ?IMP ? ?TAS ? ?IMP >>>>> ?"265" "2147" "2149" "3066" "7040" "7040" "7043" >>>>> >>>>> $`GO:0033342` >>>>> ? ? IMP >>>>> "23560" >>>>> >>>>> So many GO terms containing the word "collagen" are not listed, like >>>>> 0004656 >>>>> 0005518 >>>>> etc >>>>> Amigo claims there are 68 such terms and the list above has only 8 >>>>> What did I do wrong? >>>>> Also I would like to omit the IEA group >>>>> >>>>> Thank you >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>>>>> Date: Sun, 28 Feb 2010 19:30:34 -0800 >>>>>> To: Loren Engrav <engrav at="" u.washington.edu=""> >>>>>> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>>>> Subject: Re: [BioC] GO's to gene's >>>>>> >>>>>> On 02/28/2010 07:17 PM, Loren Engrav wrote: >>>>>>> Thank you both >>>>>>> Given my skills, it might be easier/quicker to do it "manually" with >>>>>>> Amigo >>>>>>> But I am trying both methods >>>>>>> >>>>>>> For the second method I get >>>>>>> >>>>>>>> library(GO.db) >>>>>>> Loading required package: AnnotationDbi >>>>>>> Loading required package: Biobase >>>>>>> >>>>>>> Welcome to Bioconductor >>>>>>> >>>>>>> ? Vignettes contain introductory material. To view, type >>>>>>> ? 'openVignette()'. To cite Bioconductor, see >>>>>>> ? 'citation("Biobase")' and for packages 'citation(pkgname)'. >>>>>>> >>>>>>> Loading required package: DBI >>>>>>>> terms <- Term(GOTERM) >>>>>>> Error in function (classes, fdef, mtable) ?: >>>>>>> ? unable to find an inherited method for function "Term", for signature >>>>>>> "GOTermsAnnDbBimap" >>>>>>> >>>>>>>> sessionInfo() >>>>>>> R version 2.9.2 Patched (2009-09-05 r49613) >>>>>>> i386-apple-darwin9.8.0 >>>>>>> >>>>>>> locale: >>>>>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>>>> , >>>>>>> attached base packages: >>>>>>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >>>>>> >>>>>> Update to R version 2.10 and associated Bioc packages, or for a (much) >>>>>> slower solution (you'll want to check that Term and Ontology return ids >>>>>> in identical order) >>>>>> >>>>>> ? terms = eapply(GOTERM, Term) >>>>>> >>>>>> etc. I have >>>>>> >>>>>>> sessionInfo() >>>>>> R version 2.10.1 Patched (2010-02-23 r51168) >>>>>> x86_64-unknown-linux-gnu >>>>>> >>>>>> locale: >>>>>> ?[1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C >>>>>> ?[3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8 >>>>>> ?[5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8 >>>>>> ?[7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C >>>>>> ?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C >>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>>> >>>>>> attached base packages: >>>>>> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base >>>>>> >>>>>> other attached packages: >>>>>> [1] GO.db_2.3.5 ? ? ? ? RSQLite_0.7-3 ? ? ? DBI_0.2-4 >>>>>> [4] AnnotationDbi_1.8.1 Biobase_2.6.1 >>>>>> >>>>>> loaded via a namespace (and not attached): >>>>>> [1] tools_2.10.1 >>>>>> >>>>>> >>>>>> Martin >>>>>> >>>>>>> >>>>>>>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>>>>>>> Date: Sun, 28 Feb 2010 18:42:33 -0800 >>>>>>>> To: Vincent Carey <stvjc at="" channing.harvard.edu=""> >>>>>>>> Cc: Loren Engrav <engrav at="" u.washington.edu="">, >>>>>>>> "bioconductor at stat.math.ethz.ch" >>>>>>>> <bioconductor at="" stat.math.ethz.ch=""> >>>>>>>> Subject: Re: [BioC] GO's to gene's >>>>>>>> >>>>>>>> On 02/28/2010 06:14 PM, Vincent Carey wrote: >>>>>>>>> Perhaps there is a package with such functionality. ?However, with the >>>>>>>>> GO.db package in place, you need to do a little >>>>>>>>> programming, perhaps along the lines of >>>>>>>>> >>>>>>>>> querGO = function(str, attr = "definition", ont = "MF") { >>>>>>>>> ? require(GO.db, quietly = TRUE) >>>>>>>>> ? gc = GO_dbconn() >>>>>>>>> ? quer.1 = paste("select go_id, term from go_term where", >>>>>>>>> ? attr, "like('%") >>>>>>>>> ? quer.2 = "%') and ontology = '" >>>>>>>>> ? quer.3 = "'" >>>>>>>>> ? quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "", >>>>>>>>> ? sep = "") >>>>>>>>> ? dbGetQuery(gc, quer) >>>>>>>>> } >>>>>>>>> >>>>>>>>> whereby >>>>>>>>> >>>>>>>>>> querGO("collagen", "term") >>>>>>>>> ? ? ? ?go_id >>>>>>>>> term >>>>>>>>> 1 GO:0004656 ? ? ? ? ? ? ? ? ? ? procollagen-proline 4-dioxygenase >>>>>>>>> activity >>>>>>>>> 2 GO:0005518 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? collagen >>>>>>>>> binding >>>>>>>>> 3 GO:0008475 ? ? ? ? ? ? ? ? ? ? ?procollagen-lysine 5-dioxygenase >>>>>>>>> activity >>>>>>>>> 4 GO:0019797 ? ? ? ? ? ? ? ? ? ? procollagen-proline 3-dioxygenase >>>>>>>>> activity >>>>>>>>> 5 GO:0019798 ? ? ? ? ? ? ? ? ? ? ? procollagen-proline dioxygenase >>>>>>>>> activity >>>>>>>>> 6 GO:0033823 ? ? ? ? ? ? ? ? ? ? ? procollagen glucosyltransferase >>>>>>>>> activity >>>>>>>>> 7 GO:0042329 structural constituent of collagen and cuticulin-based >>>>>>>>> cuticle >>>>>>>>> 8 GO:0050211 ? ? ? ? ? ? ? ? ? ? procollagen galactosyltransferase >>>>>>>>> activity >>>>>>>>> 9 GO:0070052 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? collagen V >>>>>>>>> binding >>>>>>>>>> >>>>>>>> >>>>>>>> Also >>>>>>>> >>>>>>>> ? library(GO.db) >>>>>>>> ? terms <- Term(GOTERM) ?# or maybe Definition(GOTERM) ? >>>>>>>> ? ontologies <- Ontology(GOTERM) >>>>>>>> ? collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)] >>>>>>>> >>>>>>>> and the next step, >>>>>>>> >>>>>>>> ? library(org.Hs.eg.db) >>>>>>>> ? egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) >>>>>>>> ? egids <- egids[!is.na(egids)] >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav <engrav at="" u.washington.edu=""> >>>>>>>>> wrote: >>>>>>>>>> Is there a BioC package that will find all the GO terms containing >>>>>>>>>> some >>>>>>>>>> word, like perhaps ?collagen? >>>>>>>>>> And then find all the genes contained within those found terms >>>>>>>>>> >>>>>>>>>> I scanned >>>>>>>>>> GoProfiles >>>>>>>>>> GOSemSim >>>>>>>>>> GOstats >>>>>>>>>> GoTools and >>>>>>>>>> TopGO >>>>>>>>>> >>>>>>>>>> And could not determine that any would do that. >>>>>>>>>> >>>>>>>>>> Thank you. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ? ? ? ?[[alternative HTML version deleted]] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioconductor mailing list >>>>>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>> Search the archives: >>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioconductor mailing list >>>>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>> Search the archives: >>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Martin Morgan >>>>>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>>>>> 1100 Fairview Ave. N. >>>>>>>> PO Box 19024 Seattle, WA 98109 >>>>>>>> >>>>>>>> Location: Arnold Building M1 B861 >>>>>>>> Phone: (206) 667-2793 >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: >>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>>> >>>>>> -- >>>>>> Martin Morgan >>>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>>> 1100 Fairview Ave. N. >>>>>> PO Box 19024 Seattle, WA 98109 >>>>>> >>>>>> Location: Arnold Building M1 B861 >>>>>> Phone: (206) 667-2793 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> Martin Morgan >> Computational Biology / Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. >> PO Box 19024 Seattle, WA 98109 >> >> Location: Arnold Building M1 B861 >> Phone: (206) 667-2793 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.3 years ago
United States
Good responses all around. Yes our annotations are updated twice a year to go with each release. So the next update is actually "imminent". The advantage to using Amigo is that it will often be ever so slightly more current, but the advantage to using our packages is that they will always give you the same answers for a given version number. So we are are doing this as a trade off in exchange for greater reproducibility in your research. This is so that when you go to publish your results, you can reference that they were done with Bioconductor version X.Y, and know that the people who read your paper in two years will still be able to get the same answers that you did. Marc On 03/01/2010 07:49 PM, Martin Morgan wrote: > On 03/01/2010 06:34 PM, Loren Engrav wrote: > >> Thank you >> You are clearly very good at this >> >> So to check it all out I did it manually on Amigo. Amigo found 33 genes >> (limited to Human and omitting IEA) >> >> The org.HS.eg.db method found 29 of the 33 but did not find >> CST3 (1471) GO:0010711 IEP >> HIF1A (3091) GO:0032963 ISS >> IL6R (3570), GO:0032966 IDA and >> TRAM2 (9697) GO:0032964 IMP >> >> I suppose to figure out, for example, why org.Hs.eg.db does not map 9697 to >> GO:0032964 is complex >> > >> names(org.Hs.egGO[["9697"]]) >> > [1] "GO:0015031" "GO:0065002" "GO:0016020" "GO:0016021" > > Hmm, what are the offspring / ancestors of GO:0032964 ? > > >> GOBPOFFSPRING[["GO:0032964"]] >> > [1] "GO:0032965" "GO:0032966" "GO:0032967" > >> GOBPANCESTOR[["GO:0032964"]] >> > [1] "all" "GO:0008152" "GO:0008150" "GO:0009058" "GO:0009059" > [6] "GO:0032501" "GO:0032963" "GO:0043170" "GO:0044236" "GO:0044259" > > Nope nothing jumping out. Where's the GO data coming from? > > >> org.Hs.eg() ## or GO() >> > [snip] > Date for GO data: 20090830 > > Whereas AMIGO says (at the bottom of each page) > > GO database release 2010-02-27 > > so that looks like a likely issue that would require some more > substantial investigation. Merits of using a 'current' db (Amigo) vs a > 'versioned' db (GO.db)? See mailing list archives, e.g., current > state-of-knowledge vs. reproducibility (how would we redo the analysis > we did last month and get the same results with AMIGO?). > > On the other hand > > >> org.Hs.egGO2EG[["GO:0010711"]] >> > IEP > "1471" > >> GOTERM[["GO:0010711"]] >> > GOID: GO:0010711 > Term: negative regulation of collagen catabolic process > Ontology: BP > Definition: Any process that decreases the rate, frequency or extent of > collagen catabolism. Collagen catabolism is the proteolytic > chemical reactions and pathways resulting in the breakdown of > collagen in the extracellular matrix. > Synonym: down regulation of collagen catabolic process > Synonym: down-regulation of collagen catabolic process > Synonym: downregulation of collagen catabolic process > Synonym: inhibition of collagen catabolic process > Synonym: negative regulation of collagen breakdown > Synonym: negative regulation of collagen catabolism > Synonym: negative regulation of collagen degradation > > so why didn't we find that one? > > >> terms <- Term(GOTERM) # or maybe Definition(GOTERM) >> "GO:0010711" %in% names(terms) >> > [1] TRUE > >> terms[["GO:0010711"]] >> > [1] "negative regulation of collagen catabolic process" > > yep it's there > > >> ontologies <- Ontology(GOTERM) >> ontologies[["GO:0010711"]] >> > [1] "BP" > >> collagen <- terms[grepl("collagen", terms) & ("BP" == ontologies)] >> collagen[["GO:0010711"]] >> > [1] "negative regulation of collagen catabolic process" > > yep it's there (or were we looking for MF, as below?). > > >> egids[["GO:0010711"]] >> > IEP > "1471" > > yep it's there. So this makes me think it's a programming error or a > miscommunication. I'd suggest you write a little function > > getGO <- > function(termLike, ontology, exludeEvidence) > { > ## a few lines of code here, representing the query you perform > } > > and perhaps sharing that with the list will shed some light. > > Martin > > > >> Thank you >> >> >> >>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>> Date: Mon, 01 Mar 2010 05:16:48 -0800 >>> To: Loren Engrav <engrav at="" u.washington.edu=""> >>> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>> Subject: Re: [BioC] GO's to gene's >>> >>> On 02/28/2010 09:01 PM, Loren Engrav wrote: >>> >>>> So I checked >>>> >>>>> collagen >>>>> >>>> And this list matches Amigo >>>> So then would appear the issue lies in >>>> >>>>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) >>>>> >>>> Some of the names are finding no associated genes in org.Hs.egGO2EG and so >>>> appear as NA >>>> True? Possible? >>>> >>> yes. GO is not H. sapiens specific and ENTREZ ids are not 100% >>> comprehensive, so some GO terms do not map to ENTREZ ids. >>> >>> >>>>>> Also I would like to omit the IEA group >>>>>> >>> maybe >>> >>> egids <- lapply(egids, function(elt) elt[names(elt) != "IEA"]) >>> egids[sapply(egids, length) != 0] >>> >>> Martin >>> >>> >>>> My version of org.Hs.egGO2EG is 2.3.6 >>>> >>>> >>>> >>>> >>>> >>>> >>>>> From: Loren Engrav <engrav at="" u.washington.edu=""> >>>>> Date: Sun, 28 Feb 2010 20:33:05 -0800 >>>>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>>> Conversation: [BioC] GO's to gene's >>>>> Subject: Re: [BioC] GO's to gene's >>>>> >>>>> Oopps, Amigo says there are 20 such terms, not 68 as I said before, cuz I >>>>> retrieved only BP >>>>> >>>>> >>>>> >>>>>> From: Loren Engrav <engrav at="" u.washington.edu=""> >>>>>> Date: Sun, 28 Feb 2010 20:28:17 -0800 >>>>>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>>>> Conversation: [BioC] GO's to gene's >>>>>> Subject: Re: [BioC] GO's to gene's >>>>>> >>>>>> Ok thank you >>>>>> I now show >>>>>> >>>>>>> sessionInfo() >>>>>>> >>>>>> R version 2.10.1 (2009-12-14) >>>>>> i386-apple-darwin9.8.0 >>>>>> >>>>>> locale: >>>>>> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>>>> >>>>>> attached base packages: >>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>> >>>>>> other attached packages: >>>>>> [1] org.Hs.eg.db_2.3.6 GO.db_2.3.5 RSQLite_0.8-3 >>>>>> AnnotationDbi_1.8.1 DBI_0.2-5 >>>>>> [6] Biobase_2.6.1 >>>>>> >>>>>> loaded via a namespace (and not attached): >>>>>> [1] tools_2.10.1 >>>>>> >>>>>> And all commands pass with no errors, however I see >>>>>> >>>>>> >>>>>>> egids >>>>>>> >>>>>> $`GO:0010711` >>>>>> IEP >>>>>> "1471" >>>>>> >>>>>> $`GO:0030199` >>>>>> IEA IEA ISS IEA IMP IMP IMP IMP NAS >>>>>> IMP NAS IMP ISS >>>>>> "302" "304" "538" "871" "1277" "1278" "1280" "1281" "1281" >>>>>> "1289" "1289" "1290" "1290" >>>>>> NAS IDA NAS IEA IEA IEA IEA IEA NAS >>>>>> ISS IDA ISS NAS >>>>>> "1301" "1302" "1303" "1805" "2296" "2303" "4010" "4015" "4060" >>>>>> "4763" "7042" "7046" "7373" >>>>>> NAS NAS >>>>>> "9508" "50509" >>>>>> >>>>>> $`GO:0030574` >>>>>> IEA IEA IEA IEA IEA IEA IEA IEA >>>>>> IEA IEA IEA >>>>>> "4312" "4313" "4314" "4316" "4317" "4318" "4319" "4320" >>>>>> "4322" "4325" "4327" >>>>>> IEA IDA IMP NAS IEA NAS IEA IEA >>>>>> IEA IEA >>>>>> "5184" "5645" "5645" "5653" "5657" "9508" "9509" "56547" >>>>>> "64066" "140766" >>>>>> >>>>>> $`GO:0032963` >>>>>> IEA IMP >>>>>> "3091" "7148" >>>>>> >>>>>> $`GO:0032964` >>>>>> IEA IMP IMP TAS IMP >>>>>> "871" "1277" "1281" "1281" "1289" >>>>>> >>>>>> $`GO:0032966` >>>>>> IDA IC >>>>>> "3569" "4261" >>>>>> >>>>>> $`GO:0032967` >>>>>> ISS IDA IDA IC IMP TAS IMP >>>>>> "265" "2147" "2149" "3066" "7040" "7040" "7043" >>>>>> >>>>>> $`GO:0033342` >>>>>> IMP >>>>>> "23560" >>>>>> >>>>>> So many GO terms containing the word "collagen" are not listed, like >>>>>> 0004656 >>>>>> 0005518 >>>>>> etc >>>>>> Amigo claims there are 68 such terms and the list above has only 8 >>>>>> What did I do wrong? >>>>>> Also I would like to omit the IEA group >>>>>> >>>>>> Thank you >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>>>>>> Date: Sun, 28 Feb 2010 19:30:34 -0800 >>>>>>> To: Loren Engrav <engrav at="" u.washington.edu=""> >>>>>>> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at="" stat.math.ethz.ch=""> >>>>>>> Subject: Re: [BioC] GO's to gene's >>>>>>> >>>>>>> On 02/28/2010 07:17 PM, Loren Engrav wrote: >>>>>>> >>>>>>>> Thank you both >>>>>>>> Given my skills, it might be easier/quicker to do it "manually" with >>>>>>>> Amigo >>>>>>>> But I am trying both methods >>>>>>>> >>>>>>>> For the second method I get >>>>>>>> >>>>>>>> >>>>>>>>> library(GO.db) >>>>>>>>> >>>>>>>> Loading required package: AnnotationDbi >>>>>>>> Loading required package: Biobase >>>>>>>> >>>>>>>> Welcome to Bioconductor >>>>>>>> >>>>>>>> Vignettes contain introductory material. To view, type >>>>>>>> 'openVignette()'. To cite Bioconductor, see >>>>>>>> 'citation("Biobase")' and for packages 'citation(pkgname)'. >>>>>>>> >>>>>>>> Loading required package: DBI >>>>>>>> >>>>>>>>> terms <- Term(GOTERM) >>>>>>>>> >>>>>>>> Error in function (classes, fdef, mtable) : >>>>>>>> unable to find an inherited method for function "Term", for signature >>>>>>>> "GOTermsAnnDbBimap" >>>>>>>> >>>>>>>> >>>>>>>>> sessionInfo() >>>>>>>>> >>>>>>>> R version 2.9.2 Patched (2009-09-05 r49613) >>>>>>>> i386-apple-darwin9.8.0 >>>>>>>> >>>>>>>> locale: >>>>>>>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 >>>>>>>> >>>>>>> , >>>>>>> >>>>>>>> attached base packages: >>>>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>>>> >>>>>>> Update to R version 2.10 and associated Bioc packages, or for a (much) >>>>>>> slower solution (you'll want to check that Term and Ontology return ids >>>>>>> in identical order) >>>>>>> >>>>>>> terms = eapply(GOTERM, Term) >>>>>>> >>>>>>> etc. I have >>>>>>> >>>>>>> >>>>>>>> sessionInfo() >>>>>>>> >>>>>>> R version 2.10.1 Patched (2010-02-23 r51168) >>>>>>> x86_64-unknown-linux-gnu >>>>>>> >>>>>>> locale: >>>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >>>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >>>>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C >>>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >>>>>>> >>>>>>> attached base packages: >>>>>>> [1] stats graphics grDevices utils datasets methods base >>>>>>> >>>>>>> other attached packages: >>>>>>> [1] GO.db_2.3.5 RSQLite_0.7-3 DBI_0.2-4 >>>>>>> [4] AnnotationDbi_1.8.1 Biobase_2.6.1 >>>>>>> >>>>>>> loaded via a namespace (and not attached): >>>>>>> [1] tools_2.10.1 >>>>>>> >>>>>>> >>>>>>> Martin >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >>>>>>>>> Date: Sun, 28 Feb 2010 18:42:33 -0800 >>>>>>>>> To: Vincent Carey <stvjc at="" channing.harvard.edu=""> >>>>>>>>> Cc: Loren Engrav <engrav at="" u.washington.edu="">, >>>>>>>>> "bioconductor at stat.math.ethz.ch" >>>>>>>>> <bioconductor at="" stat.math.ethz.ch=""> >>>>>>>>> Subject: Re: [BioC] GO's to gene's >>>>>>>>> >>>>>>>>> On 02/28/2010 06:14 PM, Vincent Carey wrote: >>>>>>>>> >>>>>>>>>> Perhaps there is a package with such functionality. However, with the >>>>>>>>>> GO.db package in place, you need to do a little >>>>>>>>>> programming, perhaps along the lines of >>>>>>>>>> >>>>>>>>>> querGO = function(str, attr = "definition", ont = "MF") { >>>>>>>>>> require(GO.db, quietly = TRUE) >>>>>>>>>> gc = GO_dbconn() >>>>>>>>>> quer.1 = paste("select go_id, term from go_term where", >>>>>>>>>> attr, "like('%") >>>>>>>>>> quer.2 = "%') and ontology = '" >>>>>>>>>> quer.3 = "'" >>>>>>>>>> quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "", >>>>>>>>>> sep = "") >>>>>>>>>> dbGetQuery(gc, quer) >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> whereby >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> querGO("collagen", "term") >>>>>>>>>>> >>>>>>>>>> go_id >>>>>>>>>> term >>>>>>>>>> 1 GO:0004656 procollagen-proline 4-dioxygenase >>>>>>>>>> activity >>>>>>>>>> 2 GO:0005518 collagen >>>>>>>>>> binding >>>>>>>>>> 3 GO:0008475 procollagen-lysine 5-dioxygenase >>>>>>>>>> activity >>>>>>>>>> 4 GO:0019797 procollagen-proline 3-dioxygenase >>>>>>>>>> activity >>>>>>>>>> 5 GO:0019798 procollagen-proline dioxygenase >>>>>>>>>> activity >>>>>>>>>> 6 GO:0033823 procollagen glucosyltransferase >>>>>>>>>> activity >>>>>>>>>> 7 GO:0042329 structural constituent of collagen and cuticulin-based >>>>>>>>>> cuticle >>>>>>>>>> 8 GO:0050211 procollagen galactosyltransferase >>>>>>>>>> activity >>>>>>>>>> 9 GO:0070052 collagen V >>>>>>>>>> binding >>>>>>>>>> >>>>>>>>>>> >>>>>>>>> Also >>>>>>>>> >>>>>>>>> library(GO.db) >>>>>>>>> terms <- Term(GOTERM) # or maybe Definition(GOTERM) ? >>>>>>>>> ontologies <- Ontology(GOTERM) >>>>>>>>> collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)] >>>>>>>>> >>>>>>>>> and the next step, >>>>>>>>> >>>>>>>>> library(org.Hs.eg.db) >>>>>>>>> egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA) >>>>>>>>> egids <- egids[!is.na(egids)] >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav <engrav at="" u.washington.edu=""> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Is there a BioC package that will find all the GO terms containing >>>>>>>>>>> some >>>>>>>>>>> word, like perhaps ?collagen? >>>>>>>>>>> And then find all the genes contained within those found terms >>>>>>>>>>> >>>>>>>>>>> I scanned >>>>>>>>>>> GoProfiles >>>>>>>>>>> GOSemSim >>>>>>>>>>> GOstats >>>>>>>>>>> GoTools and >>>>>>>>>>> TopGO >>>>>>>>>>> >>>>>>>>>>> And could not determine that any would do that. >>>>>>>>>>> >>>>>>>>>>> Thank you. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [[alternative HTML version deleted]] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioconductor mailing list >>>>>>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>>> Search the archives: >>>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioconductor mailing list >>>>>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>>>> Search the archives: >>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Martin Morgan >>>>>>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>>>>>> 1100 Fairview Ave. N. >>>>>>>>> PO Box 19024 Seattle, WA 98109 >>>>>>>>> >>>>>>>>> Location: Arnold Building M1 B861 >>>>>>>>> Phone: (206) 667-2793 >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioconductor mailing list >>>>>>>> Bioconductor at stat.math.ethz.ch >>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>>> Search the archives: >>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Martin Morgan >>>>>>> Computational Biology / Fred Hutchinson Cancer Research Center >>>>>>> 1100 Fairview Ave. N. >>>>>>> PO Box 19024 Seattle, WA 98109 >>>>>>> >>>>>>> Location: Arnold Building M1 B861 >>>>>>> Phone: (206) 667-2793 >>>>>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> -- >>> Martin Morgan >>> Computational Biology / Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N. >>> PO Box 19024 Seattle, WA 98109 >>> >>> Location: Arnold Building M1 B861 >>> Phone: (206) 667-2793 >>> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >
ADD COMMENT

Login before adding your answer.

Traffic: 955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6