about columm "size" in out of hyperGtest ( Gostat package)
2
0
Entering edit mode
@gregory-voisin-945
Last seen 9.9 years ago
Canada
Hi, I need a precision about columm "size" in out of hyperGtest ( Gostat package) In https://stat.ethz.ch/pipermail/bioconductor/2006-December/015346.html we can read: "The "Size" column is the number of genes annotated at the given GO term (where genes are restricted to the defined gene universe)" Hence, for a given Term and given platform, we must have a constant number. I explain: first set data : A contains 687 probesets I practise a hyperGotest: This is an extract from the result: GOBPID Pvalue OddsRatio ExpCount Count Size Term 36 GO:0008283 0.0180640706 1.913970 8.20236088 15 266 cell proliferation If I inderstand well: 266 probesets on affy HGU133.2.plus are annotated "cell proliferation" Then, I practise the same analysis on a second set (B) , inclusive of A : 414 probesets result : GOBPID Pvalue OddsRatio ExpCount Count Size Term 20 GO:0008283 0.008295992 1.765957 14.97834828 25 745 cell proliferation Here, that's mean that 745 probesets are annotated "cell proliferation" Why the number of size for the same term is not the same? Moreover, B being inclusive of A , the 25 probesets annotated "cell proliferation " , discovered in B analysis are reduced to 15 probesets in A analysis. Normally, in A analysis, I should have at least 25 probesets annotated "cell proleferation". Why didn't I find at least 25 probesets in A analysis ? Thanks Greg > sessionInfo() R version 2.8.1 (2008-12-22) i386-pc-mingw32 locale: LC_COLLATE=French_Canada.1252;LC_CTYPE=French_Canada.1252;LC_MONETARY= French_Canada.1252;LC_NUMERIC=C;LC_TIME=French_Canada.1252 attached base packages: [1] splines tools stats graphics grDevices utils datasets methods base other attached packages: [1] GOstats_2.8.0 Category_2.8.4 genefilter_1.22.0 survival_2.34-1 RBGL_1.18.0 annotate_1.20.1 xtable_1.5-4 [8] graph_1.20.0 GO.db_2.2.5 hgu133plus2.db_2.2.5 RSQLite_0.7-1 DBI_0.2-4 AnnotationDbi_1.4.3 Biobase_2.2.2 loaded via a namespace (and not attached): [1] cluster_1.11.11 GSEABase_1.4.0 XML_1.99-0 [[alternative HTML version deleted]]
GO hgu133plus2 affy GO hgu133plus2 affy • 1.6k views
ADD COMMENT
0
Entering edit mode
Marc Carlson ★ 7.2k
@marc-carlson-2264
Last seen 8.3 years ago
United States
Hi Greg, I am a little confused by your description of what you did. After peering at your explanation, I am still not completely certain that I understand what your question is. But, there is a nice description of how the gene universe can affect the number of things you find in the GOstats vignette titled "Hypergeometric Tests Using GOstats". Perhaps this can help you? http://www.bioconductor.org/packages/devel/bioc/html/GOstats.html Marc gregory voisin wrote: > Hi, > > I need a precision about columm "size" in out of hyperGtest ( Gostat package) > > In https://stat.ethz.ch/pipermail/bioconductor/2006-December/015346.html > we can read: "The "Size" column is the number of genes annotated at the given GO > term (where genes are restricted to the defined gene universe)" > Hence, for a given Term and given platform, we must have a constant number. > > I explain: > first set data : A contains 687 probesets > I practise a hyperGotest: > This is an extract from the result: > GOBPID Pvalue OddsRatio ExpCount Count Size Term > > 36 GO:0008283 0.0180640706 1.913970 8.20236088 15 266 cell proliferation > > If I inderstand well: 266 probesets on affy HGU133.2.plus are annotated "cell proliferation" > > > Then, > > I practise the same analysis on a second set (B) , inclusive of A : 414 probesets > > result : > GOBPID Pvalue OddsRatio ExpCount Count Size Term > > 20 GO:0008283 0.008295992 1.765957 14.97834828 25 745 cell proliferation > > Here, that's mean that 745 probesets are annotated "cell proliferation" > > > > Why the number of size for the same term is not the same? > > Moreover, B being inclusive of A , the 25 probesets annotated "cell proliferation " , discovered in B analysis are reduced to 15 probesets in A analysis. Normally, in A analysis, I should have at least 25 probesets annotated "cell proleferation". > > Why didn't I find at least 25 probesets in A analysis ? > > > > > Thanks > Greg > > > >> sessionInfo() >> > R version 2.8.1 (2008-12-22) > i386-pc-mingw32 > > locale: > LC_COLLATE=French_Canada.1252;LC_CTYPE=French_Canada.1252;LC_MONETAR Y=French_Canada.1252;LC_NUMERIC=C;LC_TIME=French_Canada.1252 > > attached base packages: > [1] splines tools stats graphics grDevices utils datasets methods base > > other attached packages: > [1] GOstats_2.8.0 Category_2.8.4 genefilter_1.22.0 survival_2.34-1 RBGL_1.18.0 annotate_1.20.1 xtable_1.5-4 > [8] graph_1.20.0 GO.db_2.2.5 hgu133plus2.db_2.2.5 RSQLite_0.7-1 DBI_0.2-4 AnnotationDbi_1.4.3 Biobase_2.2.2 > > loaded via a namespace (and not attached): > [1] cluster_1.11.11 GSEABase_1.4.0 XML_1.99-0 > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD COMMENT
0
Entering edit mode
tlaguna ▴ 10
@tlaguna-12370
Last seen 7.8 years ago
IMB Mainz

I think what Greg was trying to say is that if you do 2 analysis using the same gene universe, it is not clear why he obtains different universe counts for the same GO term. I did not realize that happens until a collaborator pointed it out. I am also confused about this difference. Marc, can you explain it please? Thank you!

 

PS: An example

Gene set enr: 1480

GOBPID Pvalue OddsRatio ExpCount Count Size Term
GO:0098609 0.0000012 2.2956084 38 64 149 cell-cell adhesion

 

Gene set enr: 68

GOBPID Pvalue OddsRatio ExpCount Count Size Term
GO:0098609 0.0005064 3.5353535 4 12 342 cell-cell adhesion

 

GENE UNIVERSE (same for both analyses): 5811

Why the "Size" (149 vs. 342) is different between both analyses?

ADD COMMENT
0
Entering edit mode

Trying to resurrect a thread from 7 years ago isn't the ideal way to proceed. Instead, submit a new question. Also, show code and the output from sessionInfo()

ADD REPLY
0
Entering edit mode

I think the goal in every forum is to not have duplicated threads, that's why I reopen a threat of 7 years ago. Indeed, the question was not answered.  But ok, will do.

ADD REPLY

Login before adding your answer.

Traffic: 500 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6