Question

hyperGTest results I do not understand

0

Entering edit mode

Ariel Chernomoretz ▴ 170

@ariel-chernomoretz-1863

Last seen 16 months ago

Argentina

Dear list, I am learning how results reported by hyperGTest funcion are calculated but I am getting into trouble with some results I do not understand... In the following example 'selectedEntrezIds' is a list of 1507 non- duplicated modulated ENTREZ ids, included in 'entrezUniverse', a list of 3122 non-duplicated ENTREZ ids taken as the universe. Here is the code: > hgCutoff <- 0.05 > params <- new("GOHyperGParams", + geneIds=selectedEntrezIds, + universeGeneIds=entrezUniverse, + annotation="hgu133plus2", + ontology="BP", + pvalueCutoff=hgCutoff, + conditional=TRUE, + testDirection="over") > hgOver.BP <- hyperGTest(params) > summary( hgOver.BP)[1,-7] ID Pvalue OddsRatio ExpCount Count Size 1 GO:0007229 0.002376785 7.049593 13.80089 13 15 For this particular node I think that the corresponding contingency table can be written as: selected ~selected gonode 13 2 ~gonode 1494 1613 for which ExpCount should be 15*1507/3122 = 7.24, and not 13.8 as is reported. (The pvalue I am getting is also a little bit different: 0.002428757 with phyper, 0.002429 with fisher exact test) For other go nodes I am even getting ExpCount values greater than the node size! What am I missing here? Thanks Ariel./

GO GO • 1.5k views

ADD COMMENT • link updated 17.8 years ago by Seth Falcon ★ 7.4k • written 17.8 years ago by Ariel Chernomoretz ▴ 170

score 0 · Answer 1 · 2007-02-27

0

Entering edit mode

Seth Falcon ★ 7.4k

@seth-falcon-992

Last seen 10.3 years ago

Hi Ariel, As always, it would help to know what versions you are using. Please send the output of sessionInfo() (after running the example code). If you aren't using the most current version of Category and GOstats, it would be good to try updating. Read on for a few thoughts... ariel at df.uba.ar writes: > I am learning how results reported by hyperGTest funcion are calculated > but I am getting into trouble with some results I do not understand... > In the following example 'selectedEntrezIds' is a list of 1507 non- duplicated > modulated ENTREZ ids, included in 'entrezUniverse', a list of 3122 > non-duplicated ENTREZ ids taken as the universe. > > Here is the code: >> hgCutoff <- 0.05 >> params <- new("GOHyperGParams", > + geneIds=selectedEntrezIds, > + universeGeneIds=entrezUniverse, > + annotation="hgu133plus2", > + ontology="BP", > + pvalueCutoff=hgCutoff, > + conditional=TRUE, > + testDirection="over") 1. The input universeGeneIds are filtered to remove IDs that have no annotation in the specified GO ontology. So if some of the 3122 Entrez IDs you specified have no GO BP annotation, they will have been removed from the universe. You can obtain the a list mapping GO IDs to Entrez IDs using geneIdUniverse(hgOver.BP). So the effective universe is unique(unlist(geneIdUniverse(hgOver.BP))). 2. You ran the hyperGTest with conditional=TRUE. See the vignette and paper for details. You can obtain the conditional universe using condGeneIdUniverse(hgOver.BP). If this extra info doesn't help, I'd be willing to take a closer look if you can send the selected and universe lists (offline, of course). + seth

ADD COMMENT • link 17.8 years ago Seth Falcon ★ 7.4k

0

Entering edit mode

Hi Seth, and thanks for your prompt response I do not think that this is a conditional analysis issue because the node I am focus on has no children. I did forget to take into account a specific ontology to make the contingency table. However I am still puzzled because even if I do filter on Ontology, the expected count value inferred from the new table does not agree with the one reported by GOstats either. New contingency table for BP: selected ~selected gonode 13 2 ~gonode 1230 1334 which gives ExpCount=7.229 (and not 13.8) So, if you do not mind, I accept your offer and I will send you, in private email, my selected and universe lists for you to check them. Thank you so much in advance Ariel./ I am running R 2.4 on Fedora 5 and here is the output of sessionInfo(): other attached packages: limma affy affyio Rgraphviz geneplotter xtable "2.9.1" "1.12.0" "1.2.0" "1.12.1" "1.12.0" "1.3-2" RColorBrewer GOstats Category KEGG RBGL GO "0.2-3" "2.0.0" "2.0.0" "1.14.0" "1.10.0" "1.14.0" graph genefilter survival annotate Biobase mouse4302 "1.12.0" "1.12.0" "2.29" "1.12.0" "1.12.0" "1.14.0" hgu133plus2 "1.14.0" Seth Falcon <sfalcon at="" fhcrc.org=""> ha escrito: > Hi Ariel, > > As always, it would help to know what versions you are using. Please > send the output of sessionInfo() (after running the example code). If > you aren't using the most current version of Category and GOstats, it > would be good to try updating. Read on for a few thoughts... > > ariel at df.uba.ar writes: >> I am learning how results reported by hyperGTest funcion are calculated >> but I am getting into trouble with some results I do not understand... >> In the following example 'selectedEntrezIds' is a list of 1507 >> non-duplicated >> modulated ENTREZ ids, included in 'entrezUniverse', a list of 3122 >> non-duplicated ENTREZ ids taken as the universe. >> >> Here is the code: >>> hgCutoff <- 0.05 >>> params <- new("GOHyperGParams", >> + geneIds=selectedEntrezIds, >> + universeGeneIds=entrezUniverse, >> + annotation="hgu133plus2", >> + ontology="BP", >> + pvalueCutoff=hgCutoff, >> + conditional=TRUE, >> + testDirection="over") > > 1. The input universeGeneIds are filtered to remove IDs that have no > annotation in the specified GO ontology. So if some of the 3122 > Entrez IDs you specified have no GO BP annotation, they will have > been removed from the universe. You can obtain the a list mapping > GO IDs to Entrez IDs using geneIdUniverse(hgOver.BP). So the > effective universe is unique(unlist(geneIdUniverse(hgOver.BP))). > > 2. You ran the hyperGTest with conditional=TRUE. See the vignette and > paper for details. You can obtain the conditional universe using > condGeneIdUniverse(hgOver.BP). > > If this extra info doesn't help, I'd be willing to take a closer look > if you can send the selected and universe lists (offline, of course). > > + seth >

ADD REPLY • link 17.8 years ago Ariel Chernomoretz ▴ 170

score 0 · Answer 2 · 2007-02-27

ariel at df.uba.ar writes: > Hi Seth, and thanks for your prompt response > > I do not think that this is a conditional analysis issue because > the node I am focus on has no children. > > I did forget to take into account a specific ontology to make > the contingency table. However I am still puzzled because even if I do > filter on Ontology, the expected count value inferred from the new table > does not agree with the one reported by GOstats either. > > New contingency table for BP: > > selected ~selected > gonode 13 2 > ~gonode 1230 1334 > > which gives ExpCount=7.229 (and not 13.8) > > > So, if you do not mind, I accept your offer and I will send you, in > private email, my selected and universe lists for you to check them. > Thank you so much in advance There was a bug that may be related to this issue that has been fixed. The current versions for use with R 2.4.x are GOstats 2.0.4 and Category 2.0.3. > I am running R 2.4 on Fedora 5 and here is the output of sessionInfo(): > > other attached packages: > RColorBrewer GOstats Category KEGG RBGL GO > "0.2-3" "2.0.0" "2.0.0" "1.14.0" "1.10.0" "1.14.0" The fine print of my offer to look at your data was that you confirm the issue with the latest versions ;-) Can you try getting the latest and giving this one more try. You could update like this: source("http://bioconductor.org/biocLite.R") biocLite(c("Category", "GOstats")) Or to get updates for all of your installed packages (may take awhile, but is a good idea since updates usually mean bug fixes): library("Biobase") update.packages(repos=biocReposList()) Best, + seth