advice in building GOALLENTREZID {GO}

0

Entering edit mode

Vladimir Morozov ▴ 130

@vladimir-morozov-2740

Last seen 9.6 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080408/ 853a28e2/attachment.pl

• 866 views

ADD COMMENT • link updated 16.0 years ago by Marc Carlson ★ 7.2k • written 16.0 years ago by Vladimir Morozov ▴ 130

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 7.7 years ago

United States

Vladimir Morozov wrote: > Hi, > > Can somebody advice in building GOALLENTREZID {GO} > > 'GO' package description tells: > The annotation package was built using a downloadable R package - > AnnBuilder (download and build your own) from www.bioconductor.org > <http: www.bioconductor.org=""/> using the following public data sources: > Entrez Gene:ftp://ftp.ncbi.nlm.nih.gov/gene/DATA. Built: Source data > downloaded from Entrez Gene on Wed Aug 29 09:09:16 2007 > > Gene > Ontology:ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest. > Built: Build Information not available > > > AnnBuilder function: > GOPkgBuilder(pkgName, pkgPath, filename, version, author, lazyLoad=TRUE) > requires the GO ontology structure file (filename argument) > But how does it get Entrez mapping? > > > Kind of related question, is the GO package supposed to be updated. I > see the old package version number in the Bioconductor development > version > > > Thanks > Vlad > > Hi Vlad, I recommend that you NOT use the GOALLENTREZID field from GO. There is nothing really "wrong" with it, but it's the old way of doing things. The old annotation packages are going to go away soon. Instead I would get the appropriate organism based package and use the appropriate map from there. So for example if you were using human, you would want the org.Hs.eg.db package and would want to use the mapping in the org.Hs.egGO2ALLEGS field. Also, you probably want to switch over to the newer GO.db package instead of the old style GO package. we plan to deprecate the older GO package imminently. ;) Marc

ADD COMMENT • link 16.0 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Marc, I need the "transitive"(direct and "child" term) GO->Enterz mapping. Where is Entrez mapping in GO.db? Documentation for package \u2018GO.db\u2019 version 2.0.2 Help Pages GO Bioconductor annotation data package GOBPANCESTOR Annotation of GO Identifiers to their Biological Process Ancestors GOBPCHILDREN Annotation of GO Identifiers to their Biological Process Children GOBPOFFSPRING Annotation of GO Identifiers to their Biological Process Offspring GOBPPARENTS Annotation of GO Identifiers to their Biological Process Parents GOCCANCESTOR Annotation of GO Identifiers to their Cellular Component Ancestors GOCCCHILDREN Annotation of GO Identifiers to their Cellular Component Children GOCCOFFSPRING Annotation of GO Identifiers to their Cellular Component Offspring GOCCPARENTS Annotation of GO Identifiers to their Cellular Component Parents GOMAPCOUNTS Number of mapped keys for the maps in package GO.db GOMFANCESTOR Annotation of GO identifiers to their Molecular Function Ancestors GOMFCHILDREN Annotation of GO Identifiers to their Molecular Function Children GOMFOFFSPRING Annotation of GO Identifiers to their Molecular Function Offspring GOMFPARENTS Annotation of GO Identifiers to their Molecular Function Parents GOOBSOLETE Annotation of GO identifiers by terms defined by Gene Ontology Consortium and their status are obsolete GOSYNONYM Map from GO synonyms to GO terms GOTERM Annotation of GO Identifiers to GO Terms GO_dbconn Collect information about the package annotation DB GO_dbfile Collect information about the package annotation DB GO_dbInfo Collect information about the package annotation DB GO_dbschema Collect information about the package annotation DB Thanks Vladimir -----Original Message----- From: Marc Carlson [mailto:mcarlson@fhcrc.org] Sent: Tuesday, April 08, 2008 7:12 PM To: Vladimir Morozov Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] advice in building GOALLENTREZID {GO} Vladimir Morozov wrote: > Hi, > > Can somebody advice in building GOALLENTREZID {GO} > > 'GO' package description tells: > The annotation package was built using a downloadable R package - > AnnBuilder (download and build your own) from www.bioconductor.org > <http: www.bioconductor.org=""/> using the following public data sources: > Entrez Gene:ftp://ftp.ncbi.nlm.nih.gov/gene/DATA. Built: Source data > downloaded from Entrez Gene on Wed Aug 29 09:09:16 2007 > > Gene > Ontology:ftp://ftp.geneontology.org/pub/go/godatabase/archive/latest. > Built: Build Information not available > > > AnnBuilder function: > GOPkgBuilder(pkgName, pkgPath, filename, version, author, > lazyLoad=TRUE) requires the GO ontology structure file (filename > argument) But how does it get Entrez mapping? > > > Kind of related question, is the GO package supposed to be updated. I > see the old package version number in the Bioconductor development > version > > > Thanks > Vlad > > Hi Vlad, I recommend that you NOT use the GOALLENTREZID field from GO. There is nothing really "wrong" with it, but it's the old way of doing things. The old annotation packages are going to go away soon. Instead I would get the appropriate organism based package and use the appropriate map from there. So for example if you were using human, you would want the org.Hs.eg.db package and would want to use the mapping in the org.Hs.egGO2ALLEGS field. Also, you probably want to switch over to the newer GO.db package instead of the old style GO package. we plan to deprecate the older GO package imminently. ;) Marc

ADD REPLY • link 16.0 years ago Vladimir Morozov ▴ 130

0

Entering edit mode

Vladimir Morozov wrote: > Marc, > > I need the "transitive"(direct and "child" term) GO->Enterz mapping. > Where is Entrez mapping in GO.db? > > Yes those maps were deliberately not put into the newer GO.db package. Instead you can now find this information in the organism based packages as I described in my previous post. In short, you need to look at the org.Xx.eg.db package for your species, where Xx is the genus and species 1st letter (Homo sapiens becomes Hs, Mus musculus becomes Mm etc.). Then you need to look at the org.Hs.egGO2ALLEGS and the org.Hs.egGO2EG mappings that the package contains (continuing the human example). The problem with having that data in GO was that it munges together GO to entrez gene ID associations from several different organisms at the same time. Entrez gene IDs are unique, so what we had before with these maps inside of GO is not really wrong, but we fear that someone could potentially become confused by this, and we want to help steer you guys towards getting the correct answers whenever possible. Plus, this map was already really huge and needed to be split up in order to prevent future versions of the GO package from swelling up into a "GOjira" package. ;) Hope this helps, Marc

ADD REPLY • link 16.0 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Now I got it! Thanks Can you provide the code to build GO{BP|MF|CC}OFFSPRING I probably want to update it more often than biannually Thanks Vlad -----Original Message----- From: Marc Carlson [mailto:mcarlson@fhcrc.org] Sent: Wednesday, April 09, 2008 11:57 AM To: Vladimir Morozov Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] advice in building GOALLENTREZID {GO} Vladimir Morozov wrote: > Marc, > > I need the "transitive"(direct and "child" term) GO->Enterz mapping. > Where is Entrez mapping in GO.db? > > Yes those maps were deliberately not put into the newer GO.db package. Instead you can now find this information in the organism based packages as I described in my previous post. In short, you need to look at the org.Xx.eg.db package for your species, where Xx is the genus and species 1st letter (Homo sapiens becomes Hs, Mus musculus becomes Mm etc.). Then you need to look at the org.Hs.egGO2ALLEGS and the org.Hs.egGO2EG mappings that the package contains (continuing the human example). The problem with having that data in GO was that it munges together GO to entrez gene ID associations from several different organisms at the same time. Entrez gene IDs are unique, so what we had before with these maps inside of GO is not really wrong, but we fear that someone could potentially become confused by this, and we want to help steer you guys towards getting the correct answers whenever possible. Plus, this map was already really huge and needed to be split up in order to prevent future versions of the GO package from swelling up into a "GOjira" package. ;) Hope this helps, Marc

ADD REPLY • link 16.0 years ago Vladimir Morozov ▴ 130

0

Entering edit mode

Vladimir Morozov wrote: > Now I got it! Thanks > > Can you provide the code to build GO{BP|MF|CC}OFFSPRING > I probably want to update it more often than biannually Yes, but if you do, and you want to use any other annotation, you must also build that. The annotation packages are inter-connected in non-trivial ways (many depending on GO), so mixing and matching is not a simple process, which is pretty much why we don't do it more often. best wishes Robert > > Thanks > Vlad > > > > -----Original Message----- > From: Marc Carlson [mailto:mcarlson at fhcrc.org] > Sent: Wednesday, April 09, 2008 11:57 AM > To: Vladimir Morozov > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] advice in building GOALLENTREZID {GO} > > Vladimir Morozov wrote: >> Marc, >> >> I need the "transitive"(direct and "child" term) GO->Enterz mapping. >> Where is Entrez mapping in GO.db? >> >> > > Yes those maps were deliberately not put into the newer GO.db package. > Instead you can now find this information in the organism based packages > as I described in my previous post. > > In short, you need to look at the org.Xx.eg.db package for your species, > where Xx is the genus and species 1st letter (Homo sapiens becomes Hs, > Mus musculus becomes Mm etc.). > > Then you need to look at the org.Hs.egGO2ALLEGS and the org.Hs.egGO2EG > mappings that the package contains (continuing the human example). > > The problem with having that data in GO was that it munges together GO > to entrez gene ID associations from several different organisms at the > same time. Entrez gene IDs are unique, so what we had before with these > maps inside of GO is not really wrong, but we fear that someone could > potentially become confused by this, and we want to help steer you guys > towards getting the correct answers whenever possible. Plus, this map > was already really huge and needed to be split up in order to prevent > future versions of the GO package from swelling up into a "GOjira" > package. ;) > > Hope this helps, > > > Marc > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD REPLY • link 16.0 years ago rgentleman ★ 5.5k

0

Entering edit mode

> Yes, but if you do, and you want to use any other annotation, you must also build that. The annotation packages are inter-connected in non-trivial ways (many depending on GO), so mixing and matching is not a simple process, which is pretty much why we don't do it more often. Essentially I need updated GOALLENTREZID for GO enrichment analysis on Entrez lists. I don't think it depends on other annotation slots. Having 'GO' 'OFFSPRING' lists I probably can get GOALLENTREZID fairly easy: #parse the Entrez GO "direct" mapping from NCBI gene2go <- read.delim('/home/data/public/GO/gene2go',head=F,comment.char = "#") go2g <-as.character(gene2go$V2);names(go2g)<-gene2go$V4 go2g <- split(go2g,gene2go$V3) #get Entrez GO "transitive" mapping using 'GO' 'OFFSPRING' lists go2allg=lapply(c('CC','BP','MF'),function(goType){ eval(parse(text=paste('xx=as.list(','GO',goType,'OFFSPRING',')',sep='' )) ) xx2= c( mapply(function(x){x},names(xx[is.na(xx)]),SIMPLIFY=F), mapply(function(x,y){c(x,y)},names(xx[!is.na(xx)]),xx[!is.na(xx)],SIMP LI FY=F) ) lapply(xx2,function(x){unlist(unique(go2g[x]))}) }) #collapse into one level list go2allg <- unlist(go2allg,rec=F,use.names =T) #seems to be updated 'GOALLENTREZID' excluding 'all' > length(go2allg) [1] 23678 > xx <- as.list(GOALLENTREZID) > length(xx) [1] 23679 > names(xx)[!(names(xx) %in% names(go2allg))] [1] "all" > > xx$`GO:0000328` IDA IDA IDA IDA TAS TAS TAS "850875" "851514" "853290" "853912" "855343" "855949" "856649" > go2allg[1] $`GO:0000328` IDA IDA IDA IDA TAS TAS TAS ISS "850875" "851514" "853290" "853912" "855343" "855949" "856649" "2543332" So suggestions for parsing the GeneOntology files into 'GO' 'OFFSPRING' environment would be appreciated Best, Vladimir -----Original Message----- From: Robert Gentleman [mailto:rgentlem@fhcrc.org] Sent: Wednesday, April 09, 2008 1:03 PM To: Vladimir Morozov Cc: Marc Carlson; bioconductor at stat.math.ethz.ch Subject: Re: [BioC] advice in building GOALLENTREZID {GO} Vladimir Morozov wrote: > Now I got it! Thanks > > Can you provide the code to build GO{BP|MF|CC}OFFSPRING I probably > want to update it more often than biannually Yes, but if you do, and you want to use any other annotation, you must also build that. The annotation packages are inter-connected in non-trivial ways (many depending on GO), so mixing and matching is not a simple process, which is pretty much why we don't do it more often. best wishes Robert > > Thanks > Vlad > > > > -----Original Message----- > From: Marc Carlson [mailto:mcarlson at fhcrc.org] > Sent: Wednesday, April 09, 2008 11:57 AM > To: Vladimir Morozov > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] advice in building GOALLENTREZID {GO} > > Vladimir Morozov wrote: >> Marc, >> >> I need the "transitive"(direct and "child" term) GO->Enterz mapping. >> Where is Entrez mapping in GO.db? >> >> > > Yes those maps were deliberately not put into the newer GO.db package. > Instead you can now find this information in the organism based > packages as I described in my previous post. > > In short, you need to look at the org.Xx.eg.db package for your > species, where Xx is the genus and species 1st letter (Homo sapiens > becomes Hs, Mus musculus becomes Mm etc.). > > Then you need to look at the org.Hs.egGO2ALLEGS and the > org.Hs.egGO2EG mappings that the package contains (continuing the human example). > > The problem with having that data in GO was that it munges together GO > to entrez gene ID associations from several different organisms at the > same time. Entrez gene IDs are unique, so what we had before with > these maps inside of GO is not really wrong, but we fear that someone > could potentially become confused by this, and we want to help steer > you guys towards getting the correct answers whenever possible. Plus, > this map was already really huge and needed to be split up in order to > prevent future versions of the GO package from swelling up into a "GOjira" > package. ;) > > Hope this helps, > > > Marc > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org

ADD REPLY • link 16.0 years ago Vladimir Morozov ▴ 130

Login before adding your answer.