Question: advice in building GOALLENTREZID {GO}
0
11.7 years ago by
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20080408/ 853a28e2/attachment.pl
• 444 views
modified 11.7 years ago by Marc Carlson7.2k • written 11.7 years ago by Vladimir Morozov130
0
11.7 years ago by
Marc Carlson7.2k
United States
Marc Carlson7.2k wrote:
Vladimir Morozov wrote: > Marc, > > I need the "transitive"(direct and "child" term) GO->Enterz mapping. > Where is Entrez mapping in GO.db? > > Yes those maps were deliberately not put into the newer GO.db package. Instead you can now find this information in the organism based packages as I described in my previous post. In short, you need to look at the org.Xx.eg.db package for your species, where Xx is the genus and species 1st letter (Homo sapiens becomes Hs, Mus musculus becomes Mm etc.). Then you need to look at the org.Hs.egGO2ALLEGS and the org.Hs.egGO2EG mappings that the package contains (continuing the human example). The problem with having that data in GO was that it munges together GO to entrez gene ID associations from several different organisms at the same time. Entrez gene IDs are unique, so what we had before with these maps inside of GO is not really wrong, but we fear that someone could potentially become confused by this, and we want to help steer you guys towards getting the correct answers whenever possible. Plus, this map was already really huge and needed to be split up in order to prevent future versions of the GO package from swelling up into a "GOjira" package. ;) Hope this helps, Marc
Now I got it! Thanks Can you provide the code to build GO{BP|MF|CC}OFFSPRING I probably want to update it more often than biannually Thanks Vlad -----Original Message----- From: Marc Carlson [mailto:mcarlson@fhcrc.org] Sent: Wednesday, April 09, 2008 11:57 AM To: Vladimir Morozov Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] advice in building GOALLENTREZID {GO} Vladimir Morozov wrote: > Marc, > > I need the "transitive"(direct and "child" term) GO->Enterz mapping. > Where is Entrez mapping in GO.db? > > Yes those maps were deliberately not put into the newer GO.db package. Instead you can now find this information in the organism based packages as I described in my previous post. In short, you need to look at the org.Xx.eg.db package for your species, where Xx is the genus and species 1st letter (Homo sapiens becomes Hs, Mus musculus becomes Mm etc.). Then you need to look at the org.Hs.egGO2ALLEGS and the org.Hs.egGO2EG mappings that the package contains (continuing the human example). The problem with having that data in GO was that it munges together GO to entrez gene ID associations from several different organisms at the same time. Entrez gene IDs are unique, so what we had before with these maps inside of GO is not really wrong, but we fear that someone could potentially become confused by this, and we want to help steer you guys towards getting the correct answers whenever possible. Plus, this map was already really huge and needed to be split up in order to prevent future versions of the GO package from swelling up into a "GOjira" package. ;) Hope this helps, Marc
> Yes, but if you do, and you want to use any other annotation, you must also build that. The annotation packages are inter-connected in non-trivial ways (many depending on GO), so mixing and matching is not a simple process, which is pretty much why we don't do it more often. Essentially I need updated GOALLENTREZID for GO enrichment analysis on Entrez lists. I don't think it depends on other annotation slots. Having 'GO' 'OFFSPRING' lists I probably can get GOALLENTREZID fairly easy: #parse the Entrez GO "direct" mapping from NCBI gene2go <- read.delim('/home/data/public/GO/gene2go',head=F,comment.char = "#") go2g <-as.character(gene2go$V2);names(go2g)<-gene2go$V4 go2g <- split(go2g,gene2go$V3) #get Entrez GO "transitive" mapping using 'GO' 'OFFSPRING' lists go2allg=lapply(c('CC','BP','MF'),function(goType){ eval(parse(text=paste('xx=as.list(','GO',goType,'OFFSPRING',')',sep='' )) ) xx2= c( mapply(function(x){x},names(xx[is.na(xx)]),SIMPLIFY=F), mapply(function(x,y){c(x,y)},names(xx[!is.na(xx)]),xx[!is.na(xx)],SIMP LI FY=F) ) lapply(xx2,function(x){unlist(unique(go2g[x]))}) }) #collapse into one level list go2allg <- unlist(go2allg,rec=F,use.names =T) #seems to be updated 'GOALLENTREZID' excluding 'all' > length(go2allg) [1] 23678 > xx <- as.list(GOALLENTREZID) > length(xx) [1] 23679 > names(xx)[!(names(xx) %in% names(go2allg))] [1] "all" > > xx$GO:0000328 IDA IDA IDA IDA TAS TAS TAS "850875" "851514" "853290" "853912" "855343" "855949" "856649" > go2allg[1] \$GO:0000328 IDA IDA IDA IDA TAS TAS TAS ISS "850875" "851514" "853290" "853912" "855343" "855949" "856649" "2543332" So suggestions for parsing the GeneOntology files into 'GO' 'OFFSPRING' environment would be appreciated Best, Vladimir -----Original Message----- From: Robert Gentleman [mailto:rgentlem@fhcrc.org] Sent: Wednesday, April 09, 2008 1:03 PM To: Vladimir Morozov Cc: Marc Carlson; bioconductor at stat.math.ethz.ch Subject: Re: [BioC] advice in building GOALLENTREZID {GO} Vladimir Morozov wrote: > Now I got it! Thanks > > Can you provide the code to build GO{BP|MF|CC}OFFSPRING I probably > want to update it more often than biannually Yes, but if you do, and you want to use any other annotation, you must also build that. The annotation packages are inter-connected in non-trivial ways (many depending on GO), so mixing and matching is not a simple process, which is pretty much why we don't do it more often. best wishes Robert > > Thanks > Vlad > > > > -----Original Message----- > From: Marc Carlson [mailto:mcarlson at fhcrc.org] > Sent: Wednesday, April 09, 2008 11:57 AM > To: Vladimir Morozov > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] advice in building GOALLENTREZID {GO} > > Vladimir Morozov wrote: >> Marc, >> >> I need the "transitive"(direct and "child" term) GO->Enterz mapping. >> Where is Entrez mapping in GO.db? >> >> > > Yes those maps were deliberately not put into the newer GO.db package. > Instead you can now find this information in the organism based > packages as I described in my previous post. > > In short, you need to look at the org.Xx.eg.db package for your > species, where Xx is the genus and species 1st letter (Homo sapiens > becomes Hs, Mus musculus becomes Mm etc.). > > Then you need to look at the org.Hs.egGO2ALLEGS and the > org.Hs.egGO2EG mappings that the package contains (continuing the human example). > > The problem with having that data in GO was that it munges together GO > to entrez gene ID associations from several different organisms at the > same time. Entrez gene IDs are unique, so what we had before with > these maps inside of GO is not really wrong, but we fear that someone > could potentially become confused by this, and we want to help steer > you guys towards getting the correct answers whenever possible. Plus, > this map was already really huge and needed to be split up in order to > prevent future versions of the GO package from swelling up into a "GOjira" > package. ;) > > Hope this helps, > > > Marc > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org