Dear list,
I am learning how results reported by hyperGTest funcion are
calculated
but I am getting into trouble with some results I do not understand...
In the following example 'selectedEntrezIds' is a list of 1507 non-
duplicated
modulated ENTREZ ids, included in 'entrezUniverse', a list of 3122
non-duplicated ENTREZ ids taken as the universe.
Here is the code:
> hgCutoff <- 0.05
> params <- new("GOHyperGParams",
+ geneIds=selectedEntrezIds,
+ universeGeneIds=entrezUniverse,
+ annotation="hgu133plus2",
+ ontology="BP",
+ pvalueCutoff=hgCutoff,
+ conditional=TRUE,
+ testDirection="over")
> hgOver.BP <- hyperGTest(params)
> summary( hgOver.BP)[1,-7]
ID Pvalue OddsRatio ExpCount Count Size
1 GO:0007229 0.002376785 7.049593 13.80089 13 15
For this particular node I think that the corresponding contingency
table
can be written as:
selected ~selected
gonode 13 2
~gonode 1494 1613
for which ExpCount should be 15*1507/3122 = 7.24, and not 13.8 as is
reported.
(The pvalue I am getting is also a little bit different: 0.002428757
with phyper, 0.002429 with fisher exact test)
For other go nodes I am even getting ExpCount values greater than the
node size!
What am I missing here?
Thanks
Ariel./
Hi Ariel,
As always, it would help to know what versions you are using. Please
send the output of sessionInfo() (after running the example code). If
you aren't using the most current version of Category and GOstats, it
would be good to try updating. Read on for a few thoughts...
ariel at df.uba.ar writes:
> I am learning how results reported by hyperGTest funcion are
calculated
> but I am getting into trouble with some results I do not
understand...
> In the following example 'selectedEntrezIds' is a list of 1507 non-
duplicated
> modulated ENTREZ ids, included in 'entrezUniverse', a list of 3122
> non-duplicated ENTREZ ids taken as the universe.
>
> Here is the code:
>> hgCutoff <- 0.05
>> params <- new("GOHyperGParams",
> + geneIds=selectedEntrezIds,
> + universeGeneIds=entrezUniverse,
> + annotation="hgu133plus2",
> + ontology="BP",
> + pvalueCutoff=hgCutoff,
> + conditional=TRUE,
> + testDirection="over")
1. The input universeGeneIds are filtered to remove IDs that have no
annotation in the specified GO ontology. So if some of the 3122
Entrez IDs you specified have no GO BP annotation, they will have
been removed from the universe. You can obtain the a list mapping
GO IDs to Entrez IDs using geneIdUniverse(hgOver.BP). So the
effective universe is unique(unlist(geneIdUniverse(hgOver.BP))).
2. You ran the hyperGTest with conditional=TRUE. See the vignette and
paper for details. You can obtain the conditional universe using
condGeneIdUniverse(hgOver.BP).
If this extra info doesn't help, I'd be willing to take a closer look
if you can send the selected and universe lists (offline, of course).
+ seth
Hi Seth, and thanks for your prompt response
I do not think that this is a conditional analysis issue because
the node I am focus on has no children.
I did forget to take into account a specific ontology to make
the contingency table. However I am still puzzled because even if I do
filter on Ontology, the expected count value inferred from the new
table
does not agree with the one reported by GOstats either.
New contingency table for BP:
selected ~selected
gonode 13 2
~gonode 1230 1334
which gives ExpCount=7.229 (and not 13.8)
So, if you do not mind, I accept your offer and I will send you, in
private email, my selected and universe lists for you to check them.
Thank you so much in advance
Ariel./
I am running R 2.4 on Fedora 5 and here is the output of
sessionInfo():
other attached packages:
limma affy affyio Rgraphviz geneplotter
xtable
"2.9.1" "1.12.0" "1.2.0" "1.12.1" "1.12.0"
"1.3-2"
RColorBrewer GOstats Category KEGG RBGL
GO
"0.2-3" "2.0.0" "2.0.0" "1.14.0" "1.10.0"
"1.14.0"
graph genefilter survival annotate Biobase
mouse4302
"1.12.0" "1.12.0" "2.29" "1.12.0" "1.12.0"
"1.14.0"
hgu133plus2
"1.14.0"
Seth Falcon <sfalcon at="" fhcrc.org=""> ha escrito:
> Hi Ariel,
>
> As always, it would help to know what versions you are using.
Please
> send the output of sessionInfo() (after running the example code).
If
> you aren't using the most current version of Category and GOstats,
it
> would be good to try updating. Read on for a few thoughts...
>
> ariel at df.uba.ar writes:
>> I am learning how results reported by hyperGTest funcion are
calculated
>> but I am getting into trouble with some results I do not
understand...
>> In the following example 'selectedEntrezIds' is a list of 1507
>> non-duplicated
>> modulated ENTREZ ids, included in 'entrezUniverse', a list of 3122
>> non-duplicated ENTREZ ids taken as the universe.
>>
>> Here is the code:
>>> hgCutoff <- 0.05
>>> params <- new("GOHyperGParams",
>> + geneIds=selectedEntrezIds,
>> + universeGeneIds=entrezUniverse,
>> + annotation="hgu133plus2",
>> + ontology="BP",
>> + pvalueCutoff=hgCutoff,
>> + conditional=TRUE,
>> + testDirection="over")
>
> 1. The input universeGeneIds are filtered to remove IDs that have no
> annotation in the specified GO ontology. So if some of the 3122
> Entrez IDs you specified have no GO BP annotation, they will have
> been removed from the universe. You can obtain the a list
mapping
> GO IDs to Entrez IDs using geneIdUniverse(hgOver.BP). So the
> effective universe is unique(unlist(geneIdUniverse(hgOver.BP))).
>
> 2. You ran the hyperGTest with conditional=TRUE. See the vignette
and
> paper for details. You can obtain the conditional universe using
> condGeneIdUniverse(hgOver.BP).
>
> If this extra info doesn't help, I'd be willing to take a closer
look
> if you can send the selected and universe lists (offline, of
course).
>
> + seth
>
ariel at df.uba.ar writes:
> Hi Seth, and thanks for your prompt response
>
> I do not think that this is a conditional analysis issue because
> the node I am focus on has no children.
>
> I did forget to take into account a specific ontology to make
> the contingency table. However I am still puzzled because even if I
do
> filter on Ontology, the expected count value inferred from the new
table
> does not agree with the one reported by GOstats either.
>
> New contingency table for BP:
>
> selected ~selected
> gonode 13 2
> ~gonode 1230 1334
>
> which gives ExpCount=7.229 (and not 13.8)
>
>
> So, if you do not mind, I accept your offer and I will send you, in
> private email, my selected and universe lists for you to check them.
> Thank you so much in advance
There was a bug that may be related to this issue that has been
fixed. The current versions for use with R 2.4.x are GOstats 2.0.4
and Category 2.0.3.
> I am running R 2.4 on Fedora 5 and here is the output of
sessionInfo():
>
> other attached packages:
> RColorBrewer GOstats Category KEGG RBGL
GO
> "0.2-3" "2.0.0" "2.0.0" "1.14.0" "1.10.0"
"1.14.0"
The fine print of my offer to look at your data was that you confirm
the issue with the latest versions ;-)
Can you try getting the latest and giving this one more try. You
could update like this:
source("http://bioconductor.org/biocLite.R")
biocLite(c("Category", "GOstats"))
Or to get updates for all of your installed packages (may take awhile,
but is a good idea since updates usually mean bug fixes):
library("Biobase")
update.packages(repos=biocReposList())
Best,
+ seth
> The fine print of my offer to look at your data was that you confirm
> the issue with the latest versions ;-)
yeah! You were right.
I updated both, GOstats and Category and now I get the right expected
number!
Thanks
Ariel./
Seth Falcon <sfalcon at="" fhcrc.org=""> ha escrito:
> ariel at df.uba.ar writes:
>
>> Hi Seth, and thanks for your prompt response
>>
>> I do not think that this is a conditional analysis issue because
>> the node I am focus on has no children.
>>
>> I did forget to take into account a specific ontology to make
>> the contingency table. However I am still puzzled because even if I
do
>> filter on Ontology, the expected count value inferred from the new
table
>> does not agree with the one reported by GOstats either.
>>
>> New contingency table for BP:
>>
>> selected ~selected
>> gonode 13 2
>> ~gonode 1230 1334
>>
>> which gives ExpCount=7.229 (and not 13.8)
>>
>>
>> So, if you do not mind, I accept your offer and I will send you, in
>> private email, my selected and universe lists for you to check
them.
>> Thank you so much in advance
>
> There was a bug that may be related to this issue that has been
> fixed. The current versions for use with R 2.4.x are GOstats 2.0.4
> and Category 2.0.3.
>
>> I am running R 2.4 on Fedora 5 and here is the output of
sessionInfo():
>>
>> other attached packages:
>> RColorBrewer GOstats Category KEGG RBGL
>> GO
>> "0.2-3" "2.0.0" "2.0.0" "1.14.0" "1.10.0"
>> "1.14.0"
>
> The fine print of my offer to look at your data was that you confirm
> the issue with the latest versions ;-)
>
> Can you try getting the latest and giving this one more try. You
> could update like this:
>
> source("http://bioconductor.org/biocLite.R")
> biocLite(c("Category", "GOstats"))
>
> Or to get updates for all of your installed packages (may take
awhile,
> but is a good idea since updates usually mean bug fixes):
>
> library("Biobase")
> update.packages(repos=biocReposList())
>
>
> Best,
>
> + seth
>