mistmatch in GO terms between topGO_1.14.0 and org.Mm.eg.db_2.3.6
1
0
Entering edit mode
Dick Beyer ★ 1.4k
@dick-beyer-26
Last seen 9.6 years ago
Hello, I've been running topGO (using mouse Entrez Gene IDs) and found that some GO terms that turn up in the topGO analysis are not in the GO terms from org.Mm.eg.db. I'd like to give some example code to show how to generate the problem, but my topGO code is a lot of lines. The output looks like: allResults[[1]][[1]][1:2,] GO.ID Term Annotated Significant Expected classic elim weight 714 GO:0019222 regulation of metabolic process 2498 143 107.08 0.00010 0.17956 0.9057 762 GO:0006807 nitrogen compound metabolic process 3413 186 146.31 0.00011 0.45337 0.9434 So, the topGO output gives a column of GOIDs and such. Some of the problem GOIDs from topGO are GO:0030522, GO:0051094, GO:0031497, GO:0046700. I can't find these in names(Mm.egGO2EG). library("org.Mm.eg.db") Mm.egGO2EG <- as.list(org.Mm.egGO2EG) grep("GO:0030522",names(Mm.egGO2EG)) integer(0) Is it possible that topGO depends on GO.db, and I'm using org.Mm.eg.db? When I check for GO:0030522 for Mus musculus at geneontology.org, GO:0030522 is valid. I'm puzzled by the mismatch. I want to get the genes for a given GOID, so there is probably a work around. If anyone has a suggestion or idea, I'd be very grateful to know what to try. Thanks very much, Dick Here is my session info: sessionInfo() R version 2.10.0 (2009-10-26) x86_64-redhat-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=C [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.2.1 topGO_1.14.0 SparseM_0.83 graph_1.24.1 GO.db_2.3.5 org.Mm.eg.db_2.3.6 RSQLite_0.7-3 [8] DBI_0.2-4 AnnotationDbi_1.8.1 Biobase_2.6.0 biomaRt_2.2.0 gplots_2.7.4 caTools_1.10 bitops_1.0-4.1 [15] gdata_2.6.1 gtools_2.6.1 loaded via a namespace (and not attached): [1] lattice_0.17-26 RCurl_1.3-0 tools_2.10.0 XML_2.6-0 ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer
GO Mus musculus PROcess topGO GO Mus musculus PROcess topGO • 1.3k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Tue, Mar 2, 2010 at 7:15 PM, Dick Beyer <dbeyer at="" u.washington.edu=""> wrote: > Hello, > > I've been running topGO (using mouse Entrez Gene IDs) and found that some GO terms that turn up in the topGO analysis are not in the GO terms from org.Mm.eg.db. > > I'd like to give some example code to show how to generate the problem, but my topGO code is a lot of lines. ?The output looks like: > > allResults[[1]][[1]][1:2,] > ? ? ? ? GO.ID ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Term Annotated Significant Expected classic ? ?elim weight > 714 GO:0019222 ? ? regulation of metabolic process ? ? ?2498 ? ? ? ? 143 ? 107.08 0.00010 0.17956 0.9057 > 762 GO:0006807 nitrogen compound metabolic process ? ? ?3413 ? ? ? ? 186 ? 146.31 0.00011 0.45337 0.9434 > > So, the topGO output gives a column of GOIDs and such. > > Some of the problem GOIDs from topGO are GO:0030522, GO:0051094, GO:0031497, GO:0046700. > > I can't find these in names(Mm.egGO2EG). > > library("org.Mm.eg.db") > Mm.egGO2EG <- as.list(org.Mm.egGO2EG) > grep("GO:0030522",names(Mm.egGO2EG)) > integer(0) > > Is it possible that topGO depends on GO.db, and I'm using org.Mm.eg.db? ?When I check for GO:0030522 for Mus musculus at geneontology.org, GO:0030522 is valid. > > I'm puzzled by the mismatch. ?I want to get the genes for a given GOID, so there is probably a work around. ?If anyone has a suggestion or idea, I'd be very grateful to know what to try. > Hi, Dick. The Gene Ontology, as I'm sure everyone knows, is hierarchical. The org.Mm.egGO2EG table stores ONLY the most specific term for each gene. However, the org.Mm.egGO2ALLEGS stores the term and all the genes for itself AND its children. Most of the gene ontology analysis algorithms use the latter definition; it looks like topGO does also. In short, try this: get('GO:0030522',org.Mm.egGO2ALLEGS) IDA IMP IDA IGI IMP IGI IMP IMP "11835" "11835" "11848" "12034" "12034" "13082" "13123" "13983" IMP ISO IMP IDA IMP IMP IMP ISO "14228" "14599" "14602" "14815" "14815" "15502" "16000" "16000" IDA IDA IMP IDA IGI IMP IMP IDA "16601" "18667" "18854" "19213" "19378" "19378" "19411" "20181" IDA IDA IMP IMP IMP IPI IDA IGI "20182" "20183" "20779" "21815" "21848" "22215" "24074" "27401" IMP ISA IDA IDA IMP IDA "56351" "56847" "59035" "67488" "224903" "232174" Hope that helps clear things up. Sean
ADD COMMENT
0
Entering edit mode
Hi Sean, Thanks very much for looking into this. I guess I need to think about this. What is confusing to me is topGO takes a gene2GO list as input (a list of GO terms for each gene), which I generated from org.Mm.egGO2EG (no GO:0030522, for example). Getting GOIDs out of topGO that are in org.Mm.egGO2ALLEGS rather than org.Mm.egGO2EG makes me think I should build my gene2GO input list from org.Mm.egGO2ALLEGS rather than org.Mm.egGO2EG. I also didn't dig far enough when I checked GO:0030522 at geneontology.org, which showed 34 gene products for Mus musculus. However, had I looked further I would have seen GO:0030522 has no genes of its own. Until recently, I used ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz for getting Entrez Gene ID/GOIDs mappings, but switched to the Bioconductor org.Mm.eg.db way as it is much simplier. Thanks for the good education! Cheers, Dick ********************************************************************** ********* Richard P. Beyer, Ph.D. University of Washington Tel.:(206) 616 7378 Env. & Occ. Health Sci. , Box 354695 Fax: (206) 685 4696 4225 Roosevelt Way NE, # 100 Seattle, WA 98105-6099 http://depts.washington.edu/ceeh/ServiceCores/FC5/FC5.html http://staff.washington.edu/~dbeyer ********************************************************************** ********* On Tue, 2 Mar 2010, Sean Davis wrote: > On Tue, Mar 2, 2010 at 7:15 PM, Dick Beyer <dbeyer at="" u.washington.edu=""> wrote: >> Hello, >> >> I've been running topGO (using mouse Entrez Gene IDs) and found that some GO terms that turn up in the topGO analysis are not in the GO terms from org.Mm.eg.db. >> >> I'd like to give some example code to show how to generate the problem, but my topGO code is a lot of lines. ?The output looks like: >> >> allResults[[1]][[1]][1:2,] >> ? ? ? ? GO.ID ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Term Annotated Significant Expected classic ? ?elim weight >> 714 GO:0019222 ? ? regulation of metabolic process ? ? ?2498 ? ? ? ? 143 ? 107.08 0.00010 0.17956 0.9057 >> 762 GO:0006807 nitrogen compound metabolic process ? ? ?3413 ? ? ? ? 186 ? 146.31 0.00011 0.45337 0.9434 >> >> So, the topGO output gives a column of GOIDs and such. >> >> Some of the problem GOIDs from topGO are GO:0030522, GO:0051094, GO:0031497, GO:0046700. >> >> I can't find these in names(Mm.egGO2EG). >> >> library("org.Mm.eg.db") >> Mm.egGO2EG <- as.list(org.Mm.egGO2EG) >> grep("GO:0030522",names(Mm.egGO2EG)) >> integer(0) >> >> Is it possible that topGO depends on GO.db, and I'm using org.Mm.eg.db? ?When I check for GO:0030522 for Mus musculus at geneontology.org, GO:0030522 is valid. >> >> I'm puzzled by the mismatch. ?I want to get the genes for a given GOID, so there is probably a work around. ?If anyone has a suggestion or idea, I'd be very grateful to know what to try. >> > > Hi, Dick. > > The Gene Ontology, as I'm sure everyone knows, is hierarchical. The > org.Mm.egGO2EG table stores ONLY the most specific term for each gene. > However, the org.Mm.egGO2ALLEGS stores the term and all the genes for > itself AND its children. Most of the gene ontology analysis > algorithms use the latter definition; it looks like topGO does also. > In short, try this: > > get('GO:0030522',org.Mm.egGO2ALLEGS) > IDA IMP IDA IGI IMP IGI IMP IMP > "11835" "11835" "11848" "12034" "12034" "13082" "13123" "13983" > IMP ISO IMP IDA IMP IMP IMP ISO > "14228" "14599" "14602" "14815" "14815" "15502" "16000" "16000" > IDA IDA IMP IDA IGI IMP IMP IDA > "16601" "18667" "18854" "19213" "19378" "19378" "19411" "20181" > IDA IDA IMP IMP IMP IPI IDA IGI > "20182" "20183" "20779" "21815" "21848" "22215" "24074" "27401" > IMP ISA IDA IDA IMP IDA > "56351" "56847" "59035" "67488" "224903" "232174" > > Hope that helps clear things up. > > Sean >
ADD REPLY

Login before adding your answer.

Traffic: 895 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6