Mapping NCBI accession numbers to GO terms
1
0
Entering edit mode
@january-weiner-3999
Last seen 10.2 years ago
Hello, I'm not sure how to retrieve GO terms associated with the NCBI accession numbers (such as "NM_172496"). I have found references to GOLOCUSID, but I cannot find this environment. I have GOstats and I can access GOTERM, but not GOLOCUSID. Anyways, I also failed to map NCBI accession numbers to Entrez IDs using BioIDMapper: library(BioIDMapper) data(glist) > head( bio.convert( glist, 1, 24 ) ) Parsing data from UniProt 200 IDs have been processed 159 IDs have been processed Parsing data from UniProt 22 IDs have been processed No ID found in database. 0 IDs have been processed Done... P_GI ACC P_ENTREZGENEID 1 "54125119" "A6YK35\r" NA 2 "54125311" "A6YK35\r" NA 3 "54125051" "A6YK35\r" NA 4 "54125369" "A6YK35\r" NA 5 "54125435" "A7J4K5\r" NA 6 "54125083" "A6YK35\r" NA > Best regards, confused January -- -------- Dr. January Weiner 3 -------------------------------------- Max Planck Institute for Infection Biology Charit?platz 1 D-10117 Berlin, Germany Web : www.mpiib-berlin.mpg.de Tel : +49-30-28460514
GO GOstats GO GOstats • 2.4k views
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 9 weeks ago
United States
On Thu, May 20, 2010 at 10:49 AM, January Weiner < january.weiner@mpiib-berlin.mpg.de> wrote: > Hello, > > I'm not sure how to retrieve GO terms associated with the NCBI > accession numbers (such as "NM_172496"). > > I have found references to GOLOCUSID, but I cannot find this > environment. I have GOstats and I can access GOTERM, but not > GOLOCUSID. > > Perhaps this will get you going: > library(org.Mm.eg.db) > get("NM_172496", org.Mm.egREFSEQ2EG) [1] "12808" > names(get("12808", org.Mm.egGO)) [1] "GO:0001843" "GO:0005515" > sessionInfo() R version 2.12.0 Under development (unstable) (2010-05-03 r51901) x86_64-apple-darwin10.3.0 locale: [1] C attached base packages: [1] stats graphics grDevices datasets tools utils methods [8] base other attached packages: [1] org.Mm.eg.db_2.4.1 org.Hs.eg.db_2.4.1 RSQLite_0.9-0 [4] DBI_0.2-5 AnnotationDbi_1.11.1 Biobase_2.9.0 [7] weaver_1.15.0 codetools_0.2-2 digest_0.4.2 > Anyways, I also failed to map NCBI accession numbers to Entrez IDs > using BioIDMapper: > Not bioconductor; please contact the author of that package for concerns about it. > > library(BioIDMapper) > data(glist) > > head( bio.convert( glist, 1, 24 ) ) > Parsing data from UniProt > 200 IDs have been processed > 159 IDs have been processed > Parsing data from UniProt > 22 IDs have been processed > No ID found in database. 0 IDs have been processed > Done... > P_GI ACC P_ENTREZGENEID > 1 "54125119" "A6YK35\r" NA > 2 "54125311" "A6YK35\r" NA > 3 "54125051" "A6YK35\r" NA > 4 "54125369" "A6YK35\r" NA > 5 "54125435" "A7J4K5\r" NA > 6 "54125083" "A6YK35\r" NA > > > > Best regards, > > confused January > > -- > -------- Dr. January Weiner 3 -------------------------------------- > Max Planck Institute for Infection Biology > Charitéplatz 1 > D-10117 Berlin, Germany > Web : www.mpiib-berlin.mpg.de > Tel : +49-30-28460514 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi, I too would like a simple way of getting from Refseq to GOTERM(s). What's the best package (and an example if possible) for getting the actual term information (rather than the GO ID as below) from a Refseq ID? Thanks, Steve > >> Hello, >> >> I'm not sure how to retrieve GO terms associated with the NCBI >> accession numbers (such as "NM_172496"). >> >> I have found references to GOLOCUSID, but I cannot find this >> environment. I have GOstats and I can access GOTERM, but not >> GOLOCUSID. >> >> > Perhaps this will get you going: > >> library(org.Mm.eg.db) >> get("NM_172496", org.Mm.egREFSEQ2EG) > [1] "12808" >> names(get("12808", org.Mm.egGO)) > [1] "GO:0001843" "GO:0005515" > >> sessionInfo() > R version 2.12.0 Under development (unstable) (2010-05-03 r51901) > x86_64-apple-darwin10.3.0 > > locale: > [1] C > > attached base packages: > [1] stats graphics grDevices datasets tools utils methods > [8] base > > other attached packages: > [1] org.Mm.eg.db_2.4.1 org.Hs.eg.db_2.4.1 RSQLite_0.9-0 > [4] DBI_0.2-5 AnnotationDbi_1.11.1 Biobase_2.9.0 > [7] weaver_1.15.0 codetools_0.2-2 digest_0.4.2 > > > >> Anyways, I also failed to map NCBI accession numbers to Entrez IDs >> using BioIDMapper: >> > > Not bioconductor; please contact the author of that package for concerns > about it. > > >> >> library(BioIDMapper) >> data(glist) >>> head( bio.convert( glist, 1, 24 ) ) >> Parsing data from UniProt >> 200 IDs have been processed >> 159 IDs have been processed >> Parsing data from UniProt >> 22 IDs have been processed >> No ID found in database. 0 IDs have been processed >> Done... >> P_GI ACC P_ENTREZGENEID >> 1 "54125119" "A6YK35\r" NA >> 2 "54125311" "A6YK35\r" NA >> 3 "54125051" "A6YK35\r" NA >> 4 "54125369" "A6YK35\r" NA >> 5 "54125435" "A7J4K5\r" NA >> 6 "54125083" "A6YK35\r" NA >>> >> >> Best regards, >> >> confused January >> >> -- >> -------- Dr. January Weiner 3 -------------------------------------- >> Max Planck Institute for Infection Biology >> Charit?platz 1 >> D-10117 Berlin, Germany >> Web : www.mpiib-berlin.mpg.de >> Tel : +49-30-28460514 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] >
ADD REPLY
0
Entering edit mode
Hi Steve, Term(names(get(get("NM_172496", org.Mm.egREFSEQ2EG), org.Mm.egGO))) GO:0001843 GO:0005515 "neural tube closure" "protein binding" HTH, J. On 05/21/2010 12:31 PM, Steve Taylor wrote: > Hi, > > I too would like a simple way of getting from Refseq to GOTERM(s). > > What's the best package (and an example if possible) for getting the > actual term information (rather than the GO ID as below) from a Refseq ID? > > Thanks, > > Steve > >> >>> Hello, >>> >>> I'm not sure how to retrieve GO terms associated with the NCBI >>> accession numbers (such as "NM_172496"). >>> >>> I have found references to GOLOCUSID, but I cannot find this >>> environment. I have GOstats and I can access GOTERM, but not >>> GOLOCUSID. >>> >>> >> Perhaps this will get you going: >> >>> library(org.Mm.eg.db) >>> get("NM_172496", org.Mm.egREFSEQ2EG) >> [1] "12808" >>> names(get("12808", org.Mm.egGO)) >> [1] "GO:0001843" "GO:0005515" >> >>> sessionInfo() >> R version 2.12.0 Under development (unstable) (2010-05-03 r51901) >> x86_64-apple-darwin10.3.0 >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices datasets tools utils methods >> [8] base >> >> other attached packages: >> [1] org.Mm.eg.db_2.4.1 org.Hs.eg.db_2.4.1 RSQLite_0.9-0 >> [4] DBI_0.2-5 AnnotationDbi_1.11.1 Biobase_2.9.0 >> [7] weaver_1.15.0 codetools_0.2-2 digest_0.4.2 >> >> >> >>> Anyways, I also failed to map NCBI accession numbers to Entrez IDs >>> using BioIDMapper: >>> >> >> Not bioconductor; please contact the author of that package for concerns >> about it. >> >> >>> >>> library(BioIDMapper) >>> data(glist) >>>> head( bio.convert( glist, 1, 24 ) ) >>> Parsing data from UniProt >>> 200 IDs have been processed >>> 159 IDs have been processed >>> Parsing data from UniProt >>> 22 IDs have been processed >>> No ID found in database. 0 IDs have been processed >>> Done... >>> P_GI ACC P_ENTREZGENEID >>> 1 "54125119" "A6YK35\r" NA >>> 2 "54125311" "A6YK35\r" NA >>> 3 "54125051" "A6YK35\r" NA >>> 4 "54125369" "A6YK35\r" NA >>> 5 "54125435" "A7J4K5\r" NA >>> 6 "54125083" "A6YK35\r" NA >>>> >>> >>> Best regards, >>> >>> confused January >>> >>> -- >>> -------- Dr. January Weiner 3 -------------------------------------- >>> Max Planck Institute for Infection Biology >>> Charit?platz 1 >>> D-10117 Berlin, Germany >>> Web : www.mpiib-berlin.mpg.de >>> Tel : +49-30-28460514 >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> [[alternative HTML version deleted]] >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
On 05/21/2010 04:45 AM, James F. Reid wrote: > Hi Steve, > > Term(names(get(get("NM_172496", org.Mm.egREFSEQ2EG), org.Mm.egGO))) > GO:0001843 GO:0005515 > "neural tube closure" "protein binding" I'm partial to library(org.Mm.eg.db) # organism-specific library library(GO.db) # GO ontology ## a vector of REFSEQ ids. in org.*eg.db packages the 'Lkey' is the ## 'eg' part of the package name, i.e., the ENTREZ gene id, while ## 'Rkey' is the part of the thing that is getting mapped to, ## 'mappedRkeys' are those keys that are in the present map ## so here we get the first three REFSEQ ids, to be used as ## an example rids <- mappedRkeys(head(org.Mm.egREFSEQ2EG, 3)) Then the maps egids <- org.Mm.egREFSEQ2EG[rids] # REFSEQ to ENTREZ id goids <- org.Mm.egGO[mappedLkeys(egids)] # ENTREZ to GO id terms <- GOTERM[mappedRkeys(goids)] # GO to TERM we could see what we've got, e.g., toTable(terms) or maybe unique(toTable(terms)[,c("go_id", "Term")]) or more explicitly r2eg <- toTable(egids) eg2go <- toTable(goids) go2term <- unique(toTable(terms)[,c('go_id', 'Term')]) merge(merge(r2eg, eg2go), go2term) The first few lines of which are > head(merge(merge(r2eg, eg2go), go2term)) go_id gene_id accession Evidence Ontology Term 1 GO:0001666 235623 NM_001001144 IMP BP response to hypoxia 2 GO:0003674 19783 NG_005612 ND MF molecular_function 3 GO:0003674 22746 NM_001001130 ND MF molecular_function 4 GO:0005515 235623 NM_001001144 IPI MF protein binding 5 GO:0005575 19783 NG_005612 ND CC cellular_component 6 GO:0005575 22746 NM_001001130 ND CC cellular_component An alternative to map a single key might be Term(names(org.Mm.egGO[[ org.Mm.egREFSEQ2EG[["NM_172496"]] ]])) Martin > > HTH, > J. > > On 05/21/2010 12:31 PM, Steve Taylor wrote: >> Hi, >> >> I too would like a simple way of getting from Refseq to GOTERM(s). >> >> What's the best package (and an example if possible) for getting the >> actual term information (rather than the GO ID as below) from a Refseq >> ID? >> >> Thanks, >> >> Steve >> >>> >>>> Hello, >>>> >>>> I'm not sure how to retrieve GO terms associated with the NCBI >>>> accession numbers (such as "NM_172496"). >>>> >>>> I have found references to GOLOCUSID, but I cannot find this >>>> environment. I have GOstats and I can access GOTERM, but not >>>> GOLOCUSID. >>>> >>>> >>> Perhaps this will get you going: >>> >>>> library(org.Mm.eg.db) >>>> get("NM_172496", org.Mm.egREFSEQ2EG) >>> [1] "12808" >>>> names(get("12808", org.Mm.egGO)) >>> [1] "GO:0001843" "GO:0005515" >>> >>>> sessionInfo() >>> R version 2.12.0 Under development (unstable) (2010-05-03 r51901) >>> x86_64-apple-darwin10.3.0 >>> >>> locale: >>> [1] C >>> >>> attached base packages: >>> [1] stats graphics grDevices datasets tools utils methods >>> [8] base >>> >>> other attached packages: >>> [1] org.Mm.eg.db_2.4.1 org.Hs.eg.db_2.4.1 RSQLite_0.9-0 >>> [4] DBI_0.2-5 AnnotationDbi_1.11.1 Biobase_2.9.0 >>> [7] weaver_1.15.0 codetools_0.2-2 digest_0.4.2 >>> >>> >>> >>>> Anyways, I also failed to map NCBI accession numbers to Entrez IDs >>>> using BioIDMapper: >>>> >>> >>> Not bioconductor; please contact the author of that package for concerns >>> about it. >>> >>> >>>> >>>> library(BioIDMapper) >>>> data(glist) >>>>> head( bio.convert( glist, 1, 24 ) ) >>>> Parsing data from UniProt >>>> 200 IDs have been processed >>>> 159 IDs have been processed >>>> Parsing data from UniProt >>>> 22 IDs have been processed >>>> No ID found in database. 0 IDs have been processed >>>> Done... >>>> P_GI ACC P_ENTREZGENEID >>>> 1 "54125119" "A6YK35\r" NA >>>> 2 "54125311" "A6YK35\r" NA >>>> 3 "54125051" "A6YK35\r" NA >>>> 4 "54125369" "A6YK35\r" NA >>>> 5 "54125435" "A7J4K5\r" NA >>>> 6 "54125083" "A6YK35\r" NA >>>>> >>>> >>>> Best regards, >>>> >>>> confused January >>>> >>>> -- >>>> -------- Dr. January Weiner 3 -------------------------------------- >>>> Max Planck Institute for Infection Biology >>>> Charit?platz 1 >>>> D-10117 Berlin, Germany >>>> Web : www.mpiib-berlin.mpg.de >>>> Tel : +49-30-28460514 >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> [[alternative HTML version deleted]] >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 1061 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6