Question

GOstats suggestion

0

Entering edit mode

Johannes Rainer ▴ 300

@johannes-rainer-1676

Last seen 11.4 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070604/ 13384966/attachment.pl

• 1.2k views

ADD COMMENT • link updated 18.7 years ago by Dick Beyer ★ 1.4k • written 18.7 years ago by Johannes Rainer ▴ 300

score 0 · Answer 1 · 2007-06-04

Hi Johannes, You are right that the current Category/GOstats implementations rely on Bioconductor annotation data packages being available. Taking the time to generate an annotation data package using AnnBuilder would have other benefits aside from being able to use the GOstats code, but I can sympathize with wanting a way to use these tools without going through that step first. I'm not opposed to the idea of finding a way to let the GOstats tools operate without an annotation data package, but at present won't have time to implement anything (what is there now suits our needs fairly well). So patches are welcome. :-) "Johannes Rainer" <johannes.rainer at="" tcri.at=""> writes: > thanks for your suggestion, this would be a solution, > but as far as i understand the functions from the GOstats and Category > packages map each time the hyperGTest function is called the submitted ids > to GO terms using the annotation packages (i.e. hgu133plus2 annotation > packages). actually the mapping is performed in the getGoToEntrezMap > function (Category package), and this function maps EntrezGene IDs to GO > terms by first mapping affy IDs to GO terms and then affy IDs to EntrezGene > IDs. Yes, the mapping is recomputed for each call and this could probably be improved. Indeed, as we transition to SQLite-based annotation data packages, many of the contortions of the current code can be avoided entirely. I'm not sure we can avoid computing the mapping for each call because we need to filter the mapping based on the provided list of gene IDs. > when i submit the EntrezGene IDs of the selected genes and those of the gene > universe, i would not need the information from the annotation packages that > map affy ids to entrezgene ids and affy ids to GO terms. the mapping between > GO terms and EntrezGene IDs can be performed using the GO package > i.e. > > GOLL <- as.list(get("GOALLENTREZID",mode="environment")) > GOLL <- GOLL[!is.na(GOLL)] # just removing all the GO ids that are not > mapped to any EntrezGene ID > PresentGO <- sapply(GOLL,function(z){ > ifis.na(z) || length(z)==0) > return(FALSE) > any(x %in% z) # x are EntrezGene IDs, either from the > gene universe or the selected ones > } > ) > > GOLL <- GOLL[PresentGO] > > GOLL is than a list of all GO terms for the EntrezGene IDs specified with x > (containing all ontologies, MF, CC and BP) Aside: The GOALLENTREZID map should probably be replaced with organism and ontology specific maps. The current map is huge and if we were to use it as you are suggesting, I suspect it would be even slower than the current map genertion to go through and selected the desired ontology, eliminate GO IDs with no annotations in the selected gene list, etc. -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org

score 0 · Answer 2 · 2007-06-04

Hello Johannes, you can use BioConductor's "AnnBuilder" package to produce a custom annotation package for your own selection of EntrezGeneIDs and then use GOstats with this custom annotation package. This may be more efficient than mapping EntrezGeneIDs to GO nodes each time you run GOstats. Having said that, I have to admit that it took some time to get AnnBuilder running due to its dependencies. Best regards, Joern Johannes Rainer wrote: > dear Seth, dear Bioconductor members, > > as far as i understand you are using the annotation package defined with the > "annotation" parameter (GOHyperGParams) to map the submitted EntrezGeneIDs > to the GO terms. this works fine for Affymetrix arrays with available > annotation packages, but we are for example also using Exon arrays and are > annotating the probes on our own. my suggestion is to support also the > mapping from EntrezGene IDs to GO terms using the GO package. this would > allow GO analyses for all microarray platforms, not just Affmetrix arrays > with available annotation packages. > > > sincerely, jo > > >

score 0 · Answer 3 · 2007-06-06

0

Entering edit mode

Seth Falcon ★ 7.4k

@seth-falcon-992

Last seen 11.4 years ago

Hi Johannes, I forgot to mention the possibility of using the hummanLLMapping package as an annotation source. Does this do what you want? It is simply EntrezGene ID based, but organism specific. + seth -- Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center http://bioconductor.org

ADD COMMENT • link 18.7 years ago Seth Falcon ★ 7.4k

score 0 · Answer 4 · 2007-06-07

Hi Jo, I looked at all the responses to your GOstats question and I'm wondering why no one is mentioning using the topGO package. It seems to do what you want, that is, you define your universe however you want. You don't have to use affy annotation. Cheers, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer ********************************************************************** ********* ------------------------------ Message: 6 Date: Mon, 4 Jun 2007 16:56:42 +0200 From: "Johannes Rainer" <johannes.rainer@tcri.at> Subject: Re: [BioC] GOstats suggestion To: "Joern Toedling" <toedling at="" ebi.ac.uk=""> Cc: bioconductor at stat.math.ethz.ch Message-ID: <ae8a85340706040756w2c55651i23280947ac908902 at="" mail.gmail.com=""> Content-Type: text/plain thanks for your suggestion, this would be a solution, but as far as i understand the functions from the GOstats and Category packages map each time the hyperGTest function is called the submitted ids to GO terms using the annotation packages (i.e. hgu133plus2 annotation packages). actually the mapping is performed in the getGoToEntrezMap function (Category package), and this function maps EntrezGene IDs to GO terms by first mapping affy IDs to GO terms and then affy IDs to EntrezGene IDs. when i submit the EntrezGene IDs of the selected genes and those of the gene universe, i would not need the information from the annotation packages that map affy ids to entrezgene ids and affy ids to GO terms. the mapping between GO terms and EntrezGene IDs can be performed using the GO package i.e. GOLL <- as.list(get("GOALLENTREZID",mode="environment")) GOLL <- GOLL[!is.na(GOLL)] # just removing all the GO ids that are not mapped to any EntrezGene ID PresentGO <- sapply(GOLL,function(z){ ifis.na(z) || length(z)==0) return(FALSE) any(x %in% z) # x are EntrezGene IDs, either from the gene universe or the selected ones } ) GOLL <- GOLL[PresentGO] GOLL is than a list of all GO terms for the EntrezGene IDs specified with x (containing all ontologies, MF, CC and BP) i think using the GO/EntrezGene mapping from GO package would not restric the GO analysis to platforms/micro arrays where annotation packages exist... sincerely, jo On 6/4/07, Joern Toedling <toedling at="" ebi.ac.uk=""> wrote: > > Hello Johannes, > > you can use BioConductor's "AnnBuilder" package to produce a custom > annotation package for your own selection of EntrezGeneIDs and then use > GOstats with this custom annotation package. This may be more efficient > than mapping EntrezGeneIDs to GO nodes each time you run GOstats. Having > said that, I have to admit that it took some time to get AnnBuilder > running due to its dependencies. > > Best regards, > Joern > > Johannes Rainer wrote: > > dear Seth, dear Bioconductor members, > > > > as far as i understand you are using the annotation package defined with > the > > "annotation" parameter (GOHyperGParams) to map the submitted > EntrezGeneIDs > > to the GO terms. this works fine for Affymetrix arrays with available > > annotation packages, but we are for example also using Exon arrays and > are > > annotating the probes on our own. my suggestion is to support also the > > mapping from EntrezGene IDs to GO terms using the GO package. this would > > allow GO analyses for all microarray platforms, not just Affmetrix > arrays > > with available annotation packages. > > > > > > sincerely, jo > > > > > > > > -- Johannes Rainer, Msc Tyrolean Cancer Research Institute Innrain 66, 6020 Innsbruck, Austria Tel.: +43 512 570485 33 Email: johannes.rainer at tcri.at johannes.rainer at tugraz.at [[alternative HTML version deleted]]