HyperGtest interpretation

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 11.2 years ago

Dear all, I know this is a silly quation but I am having trouble interpreting the table of the hyper geometric test result. I know that the p-value is the significance value that the obtained go term is not by chance, but I don??t know what the expcount and odds ratio mean. Thank you Maria -- output of sessionInfo(): -- Sent via the guest posting facility at bioconductor.org.

• 1.0k views

ADD COMMENT • link updated 12.5 years ago by James W. MacDonald 68k • written 12.5 years ago by Guest User ★ 13k

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

Hi Maria, On 6/12/2013 9:37 AM, Maria [guest] wrote: > Dear all, > > I know this is a silly quation but I am having trouble interpreting the table of the hyper geometric test result. I wouldn't say it is a silly question (or quation, for that matter ;-D). > > I know that the p-value is the significance value that the obtained go term is not by chance, but I don??t know what the expcount and odds ratio mean. The ExpCount is the expected count of genes with the given GO term under the null distribution. The goal of the test is to find GO terms that are 'enriched' in your set of significant genes. In practice what this means is that we are looking for GO terms for which there are more genes (of that type) in your set of significant genes than we would expect. In each row there are three columns that give counts. The 'Count' column is the count of genes that are annotated to that GO ID in your set of significant genes. The 'Size' column is the number of such genes that are on the array, and the ExpCount column gives the expected number of such genes if there were no enrichment. As an example, let's say there are 200 significant genes, and 20,000 genes on the array, and there are 500 genes on the array that are annotated to GO:0000001. The ExpCount is the expected number of genes annotated to GO:000001 if we were to randomly select 200 genes from the 20,000 on the array. If you get much more or less than the expected number, then this is not likely to arise by chance, so we assume that it occurred because the set of 200 genes you selected are 'enriched' for that GO term. The odds ratio isn't IMO that helpful in this context. The general interpretation of an odds ratio is that we are comparing the odds of something happening to one group as compared to another. In epidemiological studies this is a reasonable thing to compute. As an example, you could look at smokers and non-smokers and count up the number of each that got lung cancer. If you then compute the odds ratio, you calculate the odds of getting lung cancer if you are a smoker versus the odds if you are not a smoker (and oddly enough, the odds are higher for smokers - whodathunk?). In this context, the thing that occurs (like getting cancer in the example above), is that a gene is selected as being significant. So the odds ratio gives the odds of being selected given that a gene is of GO:00001 as compared to the odds of being selected given that a gene is NOT annotated to GO:00001. Which IMO doesn't have an intuitive interpretation in this context. Best, Jim > > Thank you > > Maria > > -- output of sessionInfo(): > > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 12.5 years ago James W. MacDonald 68k

Login before adding your answer.