mistmatch in GO terms between topGO_1.14.0 and org.Mm.eg.db_2.3.6
0
0
Entering edit mode
Dick Beyer ★ 1.4k
@dick-beyer-26
Last seen 9.6 years ago
Hi Adrian, Thanks very much for your reply. Your example for building the topGO object was very helpful. Another question: Do you have a favorite way to summarize the topGO output? What I am trying to do is something like CateGOrizer: http://www.animalgenome.org/bioinfo/tools/catego/ that uses higher level GO terms to give a summary overview of the enriched GO terms. Thanks very much, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer ********************************************************************** ********* On Wed, 3 Mar 2010, Adrian Alexa wrote: > Hi Dick, > > as Sean already mentioned the org.Mm.egGO2EG contains only the most > specific GO annotations. topGO doesn't care if the supply the most > specif gene-to-GO mappings or the complete mappings. You will obtain > the same result if you use either org.Mm.egGO2EG or > org.Mm.egGO2ALLEGS. However, do to the redundancies in the > org.Mm.egGO2ALLEGS mappings I advise in using the most specific > mappings. > > Also, since you are using a Bioconductor annotation package, you don't > need to construct the gene2GO list to provide the annotations. There > is a function, namely "annFUN.org" which is more convenient to use > when building the "topGOdata" object. In this case the instantiation > of a topGOdata object will look like: > > GOdata <- new("topGOdata", > ontology = "BP", > allGenes = geneList, > nodeSize = 5, > annot = annFUN.org, > mapping = "org.Mm.eg.db", > ID = "entrez") > > The "mapping" argument tells which annotation chip to be use and the > "ID" argument selects one of the gene identifiers to be use. > > > You can also use functions from topGO to access the genes annotated to > a GO term of interest. > > # all the genes annotated to GO:0030522 -- NOT only the most specific ones! > myGenes <- genesInTerm(GOdata, "GO:0030522") > > # the number of annotated genes > no.myGenes <- countGenesInTerm(GOdata, "GO:0030522") > > > Hope this helps. Let me know if you have additional questions. > > > Regards, > Adrian > > > > > > > > > > > On Wed, Mar 3, 2010 at 7:32 AM, Dick Beyer <dbeyer at="" u.washington.edu=""> wrote: >> Hi Sean, >> >> Thanks very much for looking into this. ?I guess I need to think about this. >> ?What is confusing to me is topGO takes a gene2GO list as input (a list of >> GO terms for each gene), which I generated from org.Mm.egGO2EG (no >> GO:0030522, for example). Getting GOIDs out of topGO that are in >> org.Mm.egGO2ALLEGS rather than org.Mm.egGO2EG makes me think I should build >> my gene2GO input list from org.Mm.egGO2ALLEGS rather than org.Mm.egGO2EG. >> >> I also didn't dig far enough when I checked GO:0030522 at geneontology.org, >> which showed 34 gene products for Mus musculus. ?However, had I looked >> further I would have seen GO:0030522 has no genes of its own. >> >> Until recently, I used ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz for >> getting Entrez Gene ID/GOIDs mappings, but switched to the Bioconductor >> org.Mm.eg.db way as it is much simplier. >> >> Thanks for the good education! >> >> Cheers, >> Dick >> ******************************************************************* ************ >> Richard P. Beyer, Ph.D. University of Washington >> Tel.:(206) 616 7378 ? ? Env. & Occ. Health Sci. , Box 354695 >> Fax: (206) 685 4696 ? ? 4225 Roosevelt Way NE, # 100 >> ? ? ? ? ? ? ? ? ? ? ? ?Seattle, WA 98105-6099 >> http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html >> http://staff.washington.edu/~dbeyer >> ******************************************************************* ************ >> >> On Tue, 2 Mar 2010, Sean Davis wrote: >> >>> On Tue, Mar 2, 2010 at 7:15 PM, Dick Beyer <dbeyer at="" u.washington.edu=""> >>> wrote: >>>> >>>> Hello, >>>> >>>> I've been running topGO (using mouse Entrez Gene IDs) and found that some >>>> GO terms that turn up in the topGO analysis are not in the GO terms from >>>> org.Mm.eg.db. >>>> >>>> I'd like to give some example code to show how to generate the problem, >>>> but my topGO code is a lot of lines. ?The output looks like: >>>> >>>> allResults[[1]][[1]][1:2,] >>>> ? ? ? ? GO.ID ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Term Annotated Significant >>>> Expected classic ? ?elim weight >>>> 714 GO:0019222 ? ? regulation of metabolic process ? ? ?2498 ? ? ? ? 143 >>>> ? 107.08 0.00010 0.17956 0.9057 >>>> 762 GO:0006807 nitrogen compound metabolic process ? ? ?3413 ? ? ? ? 186 >>>> ? 146.31 0.00011 0.45337 0.9434 >>>> >>>> So, the topGO output gives a column of GOIDs and such. >>>> >>>> Some of the problem GOIDs from topGO are GO:0030522, GO:0051094, >>>> GO:0031497, GO:0046700. >>>> >>>> I can't find these in names(Mm.egGO2EG). >>>> >>>> library("org.Mm.eg.db") >>>> Mm.egGO2EG <- as.list(org.Mm.egGO2EG) >>>> grep("GO:0030522",names(Mm.egGO2EG)) >>>> integer(0) >>>> >>>> Is it possible that topGO depends on GO.db, and I'm using org.Mm.eg.db? >>>> ?When I check for GO:0030522 for Mus musculus at geneontology.org, >>>> GO:0030522 is valid. >>>> >>>> I'm puzzled by the mismatch. ?I want to get the genes for a given GOID, >>>> so there is probably a work around. ?If anyone has a suggestion or idea, I'd >>>> be very grateful to know what to try. >>>> >>> >>> Hi, Dick. >>> >>> The Gene Ontology, as I'm sure everyone knows, is hierarchical. ?The >>> org.Mm.egGO2EG table stores ONLY the most specific term for each gene. >>> However, the org.Mm.egGO2ALLEGS stores the term and all the genes for >>> itself AND its children. ?Most of the gene ontology analysis >>> algorithms use the latter definition; it looks like topGO does also. >>> In short, try this: >>> >>> get('GO:0030522',org.Mm.egGO2ALLEGS) >>> ? ?IDA ? ? ?IMP ? ? ?IDA ? ? ?IGI ? ? ?IMP ? ? ?IGI ? ? ?IMP ? ? ?IMP >>> "11835" ?"11835" ?"11848" ?"12034" ?"12034" ?"13082" ?"13123" ?"13983" >>> ? ?IMP ? ? ?ISO ? ? ?IMP ? ? ?IDA ? ? ?IMP ? ? ?IMP ? ? ?IMP ? ? ?ISO >>> "14228" ?"14599" ?"14602" ?"14815" ?"14815" ?"15502" ?"16000" ?"16000" >>> ? ?IDA ? ? ?IDA ? ? ?IMP ? ? ?IDA ? ? ?IGI ? ? ?IMP ? ? ?IMP ? ? ?IDA >>> "16601" ?"18667" ?"18854" ?"19213" ?"19378" ?"19378" ?"19411" ?"20181" >>> ? ?IDA ? ? ?IDA ? ? ?IMP ? ? ?IMP ? ? ?IMP ? ? ?IPI ? ? ?IDA ? ? ?IGI >>> "20182" ?"20183" ?"20779" ?"21815" ?"21848" ?"22215" ?"24074" ?"27401" >>> ? ?IMP ? ? ?ISA ? ? ?IDA ? ? ?IDA ? ? ?IMP ? ? ?IDA >>> "56351" ?"56847" ?"59035" ?"67488" "224903" "232174" >>> >>> Hope that helps clear things up. >>> >>> Sean >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >
Annotation GO Mus musculus PROcess topGO Annotation GO Mus musculus PROcess topGO • 1.2k views
ADD COMMENT

Login before adding your answer.

Traffic: 776 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6