mistmatch in GO terms between topGO_1.14.0 and org.Mm.eg.db

mistmatch in GO terms between topGO_1.14.0 and org.Mm.eg.db_2.3.6

0

Entering edit mode

Dick Beyer ★ 1.4k

@dick-beyer-26

Last seen 9.6 years ago

Hi Adrian, Thanks very much for your reply. Your example for building the topGO object was very helpful. Another question: Do you have a favorite way to summarize the topGO output? What I am trying to do is something like CateGOrizer: http://www.animalgenome.org/bioinfo/tools/catego/ that uses higher level GO terms to give a summary overview of the enriched GO terms. Thanks very much, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer ********************************************************************** ********* On Wed, 3 Mar 2010, Adrian Alexa wrote: > Hi Dick, > > as Sean already mentioned the org.Mm.egGO2EG contains only the most > specific GO annotations. topGO doesn't care if the supply the most > specif gene-to-GO mappings or the complete mappings. You will obtain > the same result if you use either org.Mm.egGO2EG or > org.Mm.egGO2ALLEGS. However, do to the redundancies in the > org.Mm.egGO2ALLEGS mappings I advise in using the most specific > mappings. > > Also, since you are using a Bioconductor annotation package, you don't > need to construct the gene2GO list to provide the annotations. There > is a function, namely "annFUN.org" which is more convenient to use > when building the "topGOdata" object. In this case the instantiation > of a topGOdata object will look like: > > GOdata <- new("topGOdata", > ontology = "BP", > allGenes = geneList, > nodeSize = 5, > annot = annFUN.org, > mapping = "org.Mm.eg.db", > ID = "entrez") > > The "mapping" argument tells which annotation chip to be use and the > "ID" argument selects one of the gene identifiers to be use. > > > You can also use functions from topGO to access the genes annotated to > a GO term of interest. > > # all the genes annotated to GO:0030522 -- NOT only the most specific ones! > myGenes <- genesInTerm(GOdata, "GO:0030522") > > # the number of annotated genes > no.myGenes <- countGenesInTerm(GOdata, "GO:0030522") > > > Hope this helps. Let me know if you have additional questions. > > > Regards, > Adrian > > > > > > > > > > > On Wed, Mar 3, 2010 at 7:32 AM, Dick Beyer <dbeyer at="" u.washington.edu=""> wrote: >> Hi Sean, >> >> Thanks very much for looking into this. ?I guess I need to think about this. >> ?What is confusing to me is topGO takes a gene2GO list as input (a list of >> GO terms for each gene), which I generated from org.Mm.egGO2EG (no >> GO:0030522, for example). Getting GOIDs out of topGO that are in >> org.Mm.egGO2ALLEGS rather than org.Mm.egGO2EG makes me think I should build >> my gene2GO input list from org.Mm.egGO2ALLEGS rather than org.Mm.egGO2EG. >> >> I also didn't dig far enough when I checked GO:0030522 at geneontology.org, >> which showed 34 gene products for Mus musculus. ?However, had I looked >> further I would have seen GO:0030522 has no genes of its own. >> >> Until recently, I used ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz for >> getting Entrez Gene ID/GOIDs mappings, but switched to the Bioconductor >> org.Mm.eg.db way as it is much simplier. >> >> Thanks for the good education! >> >> Cheers, >> Dick >> ******************************************************************* ************ >> Richard P. Beyer, Ph.D. University of Washington >> Tel.:(206) 616 7378 ? ? Env. & Occ. Health Sci. , Box 354695 >> Fax: (206) 685 4696 ? ? 4225 Roosevelt Way NE, # 100 >> ? ? ? ? ? ? ? ? ? ? ? ?Seattle, WA 98105-6099 >> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html >> http://staff.washington.edu/~dbeyer >> ******************************************************************* ************ >> >> On Tue, 2 Mar 2010, Sean Davis wrote: >> >>> On Tue, Mar 2, 2010 at 7:15 PM, Dick Beyer <dbeyer at="" u.washington.edu=""> >>> wrote: >>>> >>>> Hello, >>>> >>>> I've been running topGO (using mouse Entrez Gene IDs) and found that some >>>> GO terms that turn up in the topGO analysis are not in the GO terms from >>>> org.Mm.eg.db. >>>> >>>> I'd like to give some example code to show how to generate the problem, >>>> but my topGO code is a lot of lines. ?The output looks like: >>>> >>>> allResults[[1]][[1]][1:2,] >>>> ? ? ? ? GO.ID ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Term Annotated Significant >>>> Expected classic ? ?elim weight >>>> 714 GO:0019222 ? ? regulation of metabolic process ? ? ?2498 ? ? ? ? 143 >>>> ? 107.08 0.00010 0.17956 0.9057 >>>> 762 GO:0006807 nitrogen compound metabolic process ? ? ?3413 ? ? ? ? 186 >>>> ? 146.31 0.00011 0.45337 0.9434 >>>> >>>> So, the topGO output gives a column of GOIDs and such. >>>> >>>> Some of the problem GOIDs from topGO are GO:0030522, GO:0051094, >>>> GO:0031497, GO:0046700. >>>> >>>> I can't find these in names(Mm.egGO2EG). >>>> >>>> library("org.Mm.eg.db") >>>> Mm.egGO2EG <- as.list(org.Mm.egGO2EG) >>>> grep("GO:0030522",names(Mm.egGO2EG)) >>>> integer(0) >>>> >>>> Is it possible that topGO depends on GO.db, and I'm using org.Mm.eg.db? >>>> ?When I check for GO:0030522 for Mus musculus at geneontology.org, >>>> GO:0030522 is valid. >>>> >>>> I'm puzzled by the mismatch. ?I want to get the genes for a given GOID, >>>> so there is probably a work around. ?If anyone has a suggestion or idea, I'd >>>> be very grateful to know what to try. >>>> >>> >>> Hi, Dick. >>> >>> The Gene Ontology, as I'm sure everyone knows, is hierarchical. ?The >>> org.Mm.egGO2EG table stores ONLY the most specific term for each gene. >>> However, the org.Mm.egGO2ALLEGS stores the term and all the genes for >>> itself AND its children. ?Most of the gene ontology analysis >>> algorithms use the latter definition; it looks like topGO does also. >>> In short, try this: >>> >>> get('GO:0030522',org.Mm.egGO2ALLEGS) >>> ? ?IDA ? ? ?IMP ? ? ?IDA ? ? ?IGI ? ? ?IMP ? ? ?IGI ? ? ?IMP ? ? ?IMP >>> "11835" ?"11835" ?"11848" ?"12034" ?"12034" ?"13082" ?"13123" ?"13983" >>> ? ?IMP ? ? ?ISO ? ? ?IMP ? ? ?IDA ? ? ?IMP ? ? ?IMP ? ? ?IMP ? ? ?ISO >>> "14228" ?"14599" ?"14602" ?"14815" ?"14815" ?"15502" ?"16000" ?"16000" >>> ? ?IDA ? ? ?IDA ? ? ?IMP ? ? ?IDA ? ? ?IGI ? ? ?IMP ? ? ?IMP ? ? ?IDA >>> "16601" ?"18667" ?"18854" ?"19213" ?"19378" ?"19378" ?"19411" ?"20181" >>> ? ?IDA ? ? ?IDA ? ? ?IMP ? ? ?IMP ? ? ?IMP ? ? ?IPI ? ? ?IDA ? ? ?IGI >>> "20182" ?"20183" ?"20779" ?"21815" ?"21848" ?"22215" ?"24074" ?"27401" >>> ? ?IMP ? ? ?ISA ? ? ?IDA ? ? ?IDA ? ? ?IMP ? ? ?IDA >>> "56351" ?"56847" ?"59035" ?"67488" "224903" "232174" >>> >>> Hope that helps clear things up. >>> >>> Sean >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >

Annotation GO Mus musculus PROcess topGO Annotation GO Mus musculus PROcess topGO • 1.2k views

ADD COMMENT • link 14.2 years ago Dick Beyer ★ 1.4k

Login before adding your answer.