GOstats, query between the GO set from geneIdsByCategory and the GO set from org.Ce.egGO
1
0
Entering edit mode
余淼 ▴ 10
@-6344
Last seen 10.2 years ago
Dear Bioconductor developers and users, I'm using GOstats to make the enrichment analysis. And the result show follows: > summary(r) GOCCID Pvalue OddsRatio ExpCount Count Size Term 1 GO:0030131 2.533063e-08 123.95833 0.11423841 5 9 ... 2 GO:0030117 3.487024e-08 52.26471 0.22847682 6 18 ... 3 GO:0048475 3.487024e-08 52.26471 0.22847682 6 18 ... 4 GO:0030118 5.024033e-08 99.11111 0.12693157 5 10 ... In the result we can see that, the corresponding 'Count' and 'Size' values.And I use the function geneIdsByCategory(r)[["GO:0030117"]] and geneIdUniverse(r)[["GO:0030117"]] to get the corresponding genes of 'Count' and 'Size'. > geneIdsByCategory(r)[["GO:0030117"]] [1] "172180" "173121" "173701" "175940" "180713" "186194" > geneIdUniverse(r)[["GO:0030117"]] [1] "171860" "171952" "172180" "172553" "173121" "173304" "173701" [8] "174675" "175376" "175940" "177891" "178078" "178183" "179387" [15] "180317" "180713" "181163" "186194" And the same time I use the package org.Ce.egGO to look for the 'GO:0030117' and get the genes in this set, there are only 8 genes in this set. > ceGO <- toTable(org.Ce.egGO) > ceGO[ceGO$go_id == "GO:0030117",] gene_id go_id Evidence Ontology 31601 171860 GO:0030117 IEA CC 32214 172553 GO:0030117 IEA CC 32743 173304 GO:0030117 IEA CC 32927 173701 GO:0030117 IEA CC 34046 175376 GO:0030117 IEA CC 34333 175750 GO:0030117 IEA CC 35977 177891 GO:0030117 IEA CC 38309 181163 GO:0030117 IEA CC What confused me is the different of the genes number between geneIdUniverse(r)[["GO:0030117"]] and ceGO[ceGO$go_id == "GO:0030117",]. Why they are different from each other? Or some mistake I have made in those process? Wish you can give me a help! Best, MiaoYu [[alternative HTML version deleted]]
GOstats GOstats • 1.3k views
ADD COMMENT
0
Entering edit mode
Dan Du ▴ 210
@dan-du-5270
Last seen 10 months ago
Germany
Hi Miao, The discrepancy you are seeing is due to the fact that table org.Ce.egGO gives only gene ids that directly associated with the GO term but not its children, like it was described in the man page, "org.Ce.egGO is an R object that provides mappings between entrez gene identifiers and the GO identifiers that they are directly associated with. This mapping and its reverse mapping do NOT associate the child terms from the GO ontology with the gene. Only the directly evidenced terms are represented here. org.Ce.egGO2ALLEGS is an R object that provides mappings between a given GO identifier and all of the Entrez Gene identifiers annotated at that GO term OR TO ONE OF IT'S CHILD NODES in the GO ontology. Thus, this mapping is much larger and more inclusive than org.Ce.egGO2EG." So a more inclusive table you may want to have a look at is org.Ce.egGO2ALLEGS. Other sources of variation could also be related to your hyperGTest settings (like universeGeneIds). HTH, Dan On Mon, 2014-01-20 at 19:45 +0800, ?? wrote: > Dear Bioconductor developers and users, > > I'm using GOstats to make the enrichment analysis. And the result show > follows: > > > summary(r) > GOCCID Pvalue OddsRatio ExpCount Count Size Term > > 1 GO:0030131 2.533063e-08 123.95833 0.11423841 5 9 ... > 2 GO:0030117 3.487024e-08 52.26471 0.22847682 6 18 ... > 3 GO:0048475 3.487024e-08 52.26471 0.22847682 6 18 ... > 4 GO:0030118 5.024033e-08 99.11111 0.12693157 5 10 ... > > In the result we can see that, the corresponding 'Count' and 'Size' > values.And I use the function geneIdsByCategory(r)[["GO:0030117"]] and > geneIdUniverse(r)[["GO:0030117"]] to get the corresponding genes of 'Count' > and 'Size'. > > > geneIdsByCategory(r)[["GO:0030117"]] > [1] "172180" "173121" "173701" "175940" "180713" "186194" > > > geneIdUniverse(r)[["GO:0030117"]] > [1] "171860" "171952" "172180" "172553" "173121" "173304" "173701" > [8] "174675" "175376" "175940" "177891" "178078" "178183" "179387" > [15] "180317" "180713" "181163" "186194" > > And the same time I use the package org.Ce.egGO to look for the > 'GO:0030117' and get the genes in this set, there are only 8 genes in this > set. > > > ceGO <- toTable(org.Ce.egGO) > > ceGO[ceGO$go_id == "GO:0030117",] > gene_id go_id Evidence Ontology > 31601 171860 GO:0030117 IEA CC > 32214 172553 GO:0030117 IEA CC > 32743 173304 GO:0030117 IEA CC > 32927 173701 GO:0030117 IEA CC > 34046 175376 GO:0030117 IEA CC > 34333 175750 GO:0030117 IEA CC > 35977 177891 GO:0030117 IEA CC > 38309 181163 GO:0030117 IEA CC > > What confused me is the different of the genes number between > geneIdUniverse(r)[["GO:0030117"]] and ceGO[ceGO$go_id == "GO:0030117",]. > Why they are different from each other? Or some mistake I have made in > those process? > > Wish you can give me a help! > > Best, > MiaoYu > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 641 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6