Question

GO term as "keytype" in GO.db

0

Entering edit mode

Robert Castelo ★ 3.3k

@rcastelo

Last seen 4 days ago

Barcelona/Universitat Pompeu Fabra

hi, i was about to fetch GO identifiers (IDs) matching certain GO terms using the GO.db package, but i've found out that GO.db only considers GO IDs as possible keys: suppressStartupMessages(library(GO.db)) keytypes(GO.db) [1] "GOID" in section 0.4 of the AnnotationDbi vignette on "Using select with GO.db" an example is given with using GO IDs as keys but i think it would be handy to interrogate also what GO IDs match or contain a particular term such as "rna binding", for example, doing either: * for matching select(GO.db, keys="RNA binding", cols="GOID", keytype="TERM") * for containing allTerms <- keys(GO.db, keytype="TERM") rnabindingterms <- allTerms[grep("RNA binding", allTerms)] select(GO.db, keys=rnabindingterms, cols="GOID", keytype="TERM") once you got the GO IDs you can interrogate what genes have such a GO term annotated to them. currently this is not possible because the only key allowed is GOID: head(keys(GO.db, keytype="TERM")) [1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007" [6] "GO:0000009" head(keys(GO.db, keytype="DEFINITION")) [1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007" [6] "GO:0000009" head(keys(GO.db, keytype="ONTOLOGY")) [1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007" [6] "GO:0000009" while in other packages, such as org.Hs.eg.db, basically all columns of information can be used as keys: library(org.Hs.eg.db) keytypes(org.Hs.eg.db) [1] "ENTREZID" "PFAM" "IPI" "PROSITE" "ACCNUM" [6] "ALIAS" "CHR" "CHRLOC" "CHRLOCEND" "ENZYME" [11] "MAP" "PATH" "PMID" "REFSEQ" "SYMBOL" [16] "UNIGENE" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" "GENENAME" [21] "UNIPROT" "GO" "EVIDENCE" "ONTOLOGY" "GOALL" [26] "EVIDENCEALL" "ONTOLOGYALL" "OMIM" "UCSCKG" i'm also aware that GO.db defines several hash tables, among them GOTERM, which can be used in the following way for my purpose: goterms <- unlist(eapply(GOTERM, function(x) x at Term)) which(goterms == "RNA binding") GO:0003723 2714 but the first step is much slower than using the 'select' method and i would prefer to use a more homogeneous way to pull all data in GO.db i look forward to your comments on this. best regards, robert. ps: sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8 [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] org.Hs.eg.db_2.9.0 GO.db_2.9.0 RSQLite_0.11.3 [4] DBI_0.2-6 AnnotationDbi_1.22.3 Biobase_2.20.0 [7] BiocGenerics_0.6.0 vimcom_0.9-8 setwidth_1.0-3 [10] colorout_1.0-0 loaded via a namespace (and not attached): [1] IRanges_1.18.0 stats4_3.0.0

GO AnnotationDbi GO AnnotationDbi • 2.2k views

ADD COMMENT • link updated 11.0 years ago by Vincent J. Carey, Jr. 6.7k • written 11.0 years ago by Robert Castelo ★ 3.3k

score 0 · Answer 1 · 2013-04-30

i made a similar suggestion privately some time ago. perhaps it will be taken up, but it might be better if we left select alone and created a key generator for GO terms, to feed into select. part of the resistance to taking the terms on is, i believe, the need for any practically useful solution to deal with approximate matching, which is a sort of scope creep for select. so you'd have select(..., keytype="GOID", keys=got2i("RNA binding"), ... and you can define how got2i maps from strings, say, to GOids On Tue, Apr 30, 2013 at 11:50 AM, Robert Castelo <robert.castelo@upf.edu>wrote: > hi, > > i was about to fetch GO identifiers (IDs) matching certain GO terms using > the GO.db package, but i've found out that GO.db only considers GO IDs as > possible keys: > > suppressStartupMessages(**library(GO.db)) > > keytypes(GO.db) > [1] "GOID" > > in section 0.4 of the AnnotationDbi vignette on "Using select with GO.db" > an example is given with using GO IDs as keys but i think it would be handy > to interrogate also what GO IDs match or contain a particular term such as > "rna binding", for example, doing either: > > * for matching > > select(GO.db, keys="RNA binding", cols="GOID", keytype="TERM") > > * for containing > > allTerms <- keys(GO.db, keytype="TERM") > rnabindingterms <- allTerms[grep("RNA binding", allTerms)] > select(GO.db, keys=rnabindingterms, cols="GOID", keytype="TERM") > > once you got the GO IDs you can interrogate what genes have such a GO term > annotated to them. > > currently this is not possible because the only key allowed is GOID: > > head(keys(GO.db, keytype="TERM")) > [1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007" > [6] "GO:0000009" > head(keys(GO.db, keytype="DEFINITION")) > [1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007" > [6] "GO:0000009" > head(keys(GO.db, keytype="ONTOLOGY")) > [1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007" > [6] "GO:0000009" > > while in other packages, such as org.Hs.eg.db, basically all columns of > information can be used as keys: > > library(org.Hs.eg.db) > keytypes(org.Hs.eg.db) > [1] "ENTREZID" "PFAM" "IPI" "PROSITE" "ACCNUM" > [6] "ALIAS" "CHR" "CHRLOC" "CHRLOCEND" "ENZYME" > [11] "MAP" "PATH" "PMID" "REFSEQ" "SYMBOL" > [16] "UNIGENE" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" "GENENAME" > [21] "UNIPROT" "GO" "EVIDENCE" "ONTOLOGY" "GOALL" > [26] "EVIDENCEALL" "ONTOLOGYALL" "OMIM" "UCSCKG" > > > i'm also aware that GO.db defines several hash tables, among them GOTERM, > which can be used in the following way for my purpose: > > goterms <- unlist(eapply(GOTERM, function(x) x@Term)) > which(goterms == "RNA binding") > GO:0003723 > 2714 > > but the first step is much slower than using the 'select' method and i > would prefer to use a more homogeneous way to pull all data in GO.db > > > i look forward to your comments on this. > > > > best regards, > > robert. > ps: sessionInfo() > R version 3.0.0 (2013-04-03) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8 > [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] org.Hs.eg.db_2.9.0 GO.db_2.9.0 RSQLite_0.11.3 > [4] DBI_0.2-6 AnnotationDbi_1.22.3 Biobase_2.20.0 > [7] BiocGenerics_0.6.0 vimcom_0.9-8 setwidth_1.0-3 > [10] colorout_1.0-0 > > loaded via a namespace (and not attached): > [1] IRanges_1.18.0 stats4_3.0.0 > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]