GO term as "keytype" in GO.db
1
0
Entering edit mode
Robert Castelo ★ 3.3k
@rcastelo
Last seen 4 days ago
Barcelona/Universitat Pompeu Fabra
hi, i was about to fetch GO identifiers (IDs) matching certain GO terms using the GO.db package, but i've found out that GO.db only considers GO IDs as possible keys: suppressStartupMessages(library(GO.db)) keytypes(GO.db) [1] "GOID" in section 0.4 of the AnnotationDbi vignette on "Using select with GO.db" an example is given with using GO IDs as keys but i think it would be handy to interrogate also what GO IDs match or contain a particular term such as "rna binding", for example, doing either: * for matching select(GO.db, keys="RNA binding", cols="GOID", keytype="TERM") * for containing allTerms <- keys(GO.db, keytype="TERM") rnabindingterms <- allTerms[grep("RNA binding", allTerms)] select(GO.db, keys=rnabindingterms, cols="GOID", keytype="TERM") once you got the GO IDs you can interrogate what genes have such a GO term annotated to them. currently this is not possible because the only key allowed is GOID: head(keys(GO.db, keytype="TERM")) [1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007" [6] "GO:0000009" head(keys(GO.db, keytype="DEFINITION")) [1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007" [6] "GO:0000009" head(keys(GO.db, keytype="ONTOLOGY")) [1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007" [6] "GO:0000009" while in other packages, such as org.Hs.eg.db, basically all columns of information can be used as keys: library(org.Hs.eg.db) keytypes(org.Hs.eg.db) [1] "ENTREZID" "PFAM" "IPI" "PROSITE" "ACCNUM" [6] "ALIAS" "CHR" "CHRLOC" "CHRLOCEND" "ENZYME" [11] "MAP" "PATH" "PMID" "REFSEQ" "SYMBOL" [16] "UNIGENE" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" "GENENAME" [21] "UNIPROT" "GO" "EVIDENCE" "ONTOLOGY" "GOALL" [26] "EVIDENCEALL" "ONTOLOGYALL" "OMIM" "UCSCKG" i'm also aware that GO.db defines several hash tables, among them GOTERM, which can be used in the following way for my purpose: goterms <- unlist(eapply(GOTERM, function(x) x at Term)) which(goterms == "RNA binding") GO:0003723 2714 but the first step is much slower than using the 'select' method and i would prefer to use a more homogeneous way to pull all data in GO.db i look forward to your comments on this. best regards, robert. ps: sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8 [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] org.Hs.eg.db_2.9.0 GO.db_2.9.0 RSQLite_0.11.3 [4] DBI_0.2-6 AnnotationDbi_1.22.3 Biobase_2.20.0 [7] BiocGenerics_0.6.0 vimcom_0.9-8 setwidth_1.0-3 [10] colorout_1.0-0 loaded via a namespace (and not attached): [1] IRanges_1.18.0 stats4_3.0.0
GO AnnotationDbi GO AnnotationDbi • 2.2k views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 6 weeks ago
United States
i made a similar suggestion privately some time ago. perhaps it will be taken up, but it might be better if we left select alone and created a key generator for GO terms, to feed into select. part of the resistance to taking the terms on is, i believe, the need for any practically useful solution to deal with approximate matching, which is a sort of scope creep for select. so you'd have select(..., keytype="GOID", keys=got2i("RNA binding"), ... and you can define how got2i maps from strings, say, to GOids On Tue, Apr 30, 2013 at 11:50 AM, Robert Castelo <robert.castelo@upf.edu>wrote: > hi, > > i was about to fetch GO identifiers (IDs) matching certain GO terms using > the GO.db package, but i've found out that GO.db only considers GO IDs as > possible keys: > > suppressStartupMessages(**library(GO.db)) > > keytypes(GO.db) > [1] "GOID" > > in section 0.4 of the AnnotationDbi vignette on "Using select with GO.db" > an example is given with using GO IDs as keys but i think it would be handy > to interrogate also what GO IDs match or contain a particular term such as > "rna binding", for example, doing either: > > * for matching > > select(GO.db, keys="RNA binding", cols="GOID", keytype="TERM") > > * for containing > > allTerms <- keys(GO.db, keytype="TERM") > rnabindingterms <- allTerms[grep("RNA binding", allTerms)] > select(GO.db, keys=rnabindingterms, cols="GOID", keytype="TERM") > > once you got the GO IDs you can interrogate what genes have such a GO term > annotated to them. > > currently this is not possible because the only key allowed is GOID: > > head(keys(GO.db, keytype="TERM")) > [1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007" > [6] "GO:0000009" > head(keys(GO.db, keytype="DEFINITION")) > [1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007" > [6] "GO:0000009" > head(keys(GO.db, keytype="ONTOLOGY")) > [1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007" > [6] "GO:0000009" > > while in other packages, such as org.Hs.eg.db, basically all columns of > information can be used as keys: > > library(org.Hs.eg.db) > keytypes(org.Hs.eg.db) > [1] "ENTREZID" "PFAM" "IPI" "PROSITE" "ACCNUM" > [6] "ALIAS" "CHR" "CHRLOC" "CHRLOCEND" "ENZYME" > [11] "MAP" "PATH" "PMID" "REFSEQ" "SYMBOL" > [16] "UNIGENE" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" "GENENAME" > [21] "UNIPROT" "GO" "EVIDENCE" "ONTOLOGY" "GOALL" > [26] "EVIDENCEALL" "ONTOLOGYALL" "OMIM" "UCSCKG" > > > i'm also aware that GO.db defines several hash tables, among them GOTERM, > which can be used in the following way for my purpose: > > goterms <- unlist(eapply(GOTERM, function(x) x@Term)) > which(goterms == "RNA binding") > GO:0003723 > 2714 > > but the first step is much slower than using the 'select' method and i > would prefer to use a more homogeneous way to pull all data in GO.db > > > i look forward to your comments on this. > > > > best regards, > > robert. > ps: sessionInfo() > R version 3.0.0 (2013-04-03) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8 > [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] org.Hs.eg.db_2.9.0 GO.db_2.9.0 RSQLite_0.11.3 > [4] DBI_0.2-6 AnnotationDbi_1.22.3 Biobase_2.20.0 > [7] BiocGenerics_0.6.0 vimcom_0.9-8 setwidth_1.0-3 > [10] colorout_1.0-0 > > loaded via a namespace (and not attached): > [1] IRanges_1.18.0 stats4_3.0.0 > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 1059 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6