GO term as "keytype" in GO.db

0

Entering edit mode

Robert Castelo ★ 3.4k

@rcastelo

Last seen 3 months ago

Barcelona/Universitat Pompeu Fabra

hi Vince, this key generator would be a good way to provide an easy solution for the user who how doesn't know how to use string-matching solutions such as grep, specially if this function would be smart enough to generate some sensible partial matching strings. i understand that we still would need to access the GO terms as keys in GO.db in the meantime, note that the approximate matching problem is just the same as with, let's say, GENENAME in org.Hs.eg.db. in org.Hs.eg.db, however, GENENAME is a key, and therefore, currently is up to the regular expression skills of the user to find the way to match the desired string. with GENENAME as key i can for instance quickly interrogate what genes are nuclear receptors using select: allkeys <- keys(org.Hs.eg.db, keytype="GENENAME") select(org.Hs.eg.db, keys=allkeys[grep("nuclear receptor", allkeys), cols="SYMBOL", keytype="GENENAME") but i cannot do the same to pull, let's say, "RNA binding" genes using GO. cheers, robert. On 4/30/13 7:25 PM, Vincent Carey wrote: > i made a similar suggestion privately some time ago. perhaps it will > be taken up, but it might be better if we left select alone and > created a key generator for GO terms, to feed into select. part of > the resistance to taking the terms on is, i believe, the need for any > practically useful solution to deal with approximate matching, which > is a sort of scope creep for select. > > so you'd have select(..., keytype="GOID", keys=got2i("RNA binding"), ... > > and you can define how got2i maps from strings, say, to GOids > > On Tue, Apr 30, 2013 at 11:50 AM, Robert Castelo > <robert.castelo@upf.edu <mailto:robert.castelo@upf.edu="">> wrote: > > hi, > > i was about to fetch GO identifiers (IDs) matching certain GO > terms using the GO.db package, but i've found out that GO.db only > considers GO IDs as possible keys: > > suppressStartupMessages(library(GO.db)) > > keytypes(GO.db) > [1] "GOID" > > in section 0.4 of the AnnotationDbi vignette on "Using select with > GO.db" an example is given with using GO IDs as keys but i think > it would be handy to interrogate also what GO IDs match or contain > a particular term such as "rna binding", for example, doing either: > > * for matching > > select(GO.db, keys="RNA binding", cols="GOID", keytype="TERM") > > * for containing > > allTerms <- keys(GO.db, keytype="TERM") > rnabindingterms <- allTerms[grep("RNA binding", allTerms)] > select(GO.db, keys=rnabindingterms, cols="GOID", keytype="TERM") > > once you got the GO IDs you can interrogate what genes have such a > GO term annotated to them. > > currently this is not possible because the only key allowed is GOID: > > head(keys(GO.db, keytype="TERM")) > [1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007" > [6] "GO:0000009" > head(keys(GO.db, keytype="DEFINITION")) > [1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007" > [6] "GO:0000009" > head(keys(GO.db, keytype="ONTOLOGY")) > [1] "GO:0000001" "GO:0000002" "GO:0000003" "GO:0000006" "GO:0000007" > [6] "GO:0000009" > > while in other packages, such as org.Hs.eg.db, basically all > columns of information can be used as keys: > > library(org.Hs.eg.db) > keytypes(org.Hs.eg.db) > [1] "ENTREZID" "PFAM" "IPI" "PROSITE" "ACCNUM" > [6] "ALIAS" "CHR" "CHRLOC" "CHRLOCEND" "ENZYME" > [11] "MAP" "PATH" "PMID" "REFSEQ" "SYMBOL" > [16] "UNIGENE" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS" > "GENENAME" > [21] "UNIPROT" "GO" "EVIDENCE" "ONTOLOGY" > "GOALL" > [26] "EVIDENCEALL" "ONTOLOGYALL" "OMIM" "UCSCKG" > > > i'm also aware that GO.db defines several hash tables, among them > GOTERM, which can be used in the following way for my purpose: > > goterms <- unlist(eapply(GOTERM, function(x) x@Term)) > which(goterms == "RNA binding") > GO:0003723 > 2714 > > but the first step is much slower than using the 'select' method > and i would prefer to use a more homogeneous way to pull all data > in GO.db > > > i look forward to your comments on this. > > > > best regards, > > robert. > ps: sessionInfo() > R version 3.0.0 (2013-04-03) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8 > [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] org.Hs.eg.db_2.9.0 GO.db_2.9.0 RSQLite_0.11.3 > [4] DBI_0.2-6 AnnotationDbi_1.22.3 Biobase_2.20.0 > [7] BiocGenerics_0.6.0 vimcom_0.9-8 setwidth_1.0-3 > [10] colorout_1.0-0 > > loaded via a namespace (and not attached): > [1] IRanges_1.18.0 stats4_3.0.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]

GO AnnotationDbi GO AnnotationDbi • 2.6k views

ADD COMMENT • link 12.7 years ago Robert Castelo ★ 3.4k

Login before adding your answer.