goseq - overrepresented p-values
1
0
Entering edit mode
Gu Mi ▴ 30
@gu-mi-4717
Last seen 9.6 years ago
Dear All: I have a question about the bioconductor goseq (http://www.bioconductor.org/packages/release/bioc/html/goseq.html) package for GO enrichment analysis (taking length bias into consideration). Those top-ranked categories are obtained based on the ranking of "over_represented_pvalues" from the goseq object. The goseq also includes "under_represented_pvalues" from the same output. Can I know how the "over/under-representations" are determined? Why don't we consider "under-representation"? I am not sure if I can think of a category being "enriched" this way: if there are more DE genes for a particular category, then this category is "enriched" and the associated p-value is called "over- represented", while if there are fewer DE genes for a particular category, then this category is called "depleted" and "under- represented"? Can this be reflected in the sign (+/-) of certain quantities? I am new to this area, so thank you very much for your help! The vignette of the goseq package can be found here (http://www.bioconduct or.org/packages/2.8/bioc/vignettes/goseq/inst/doc/goseq.pdf) Thanks! Best, -- Gu Mi Sent with Sparrow (http://www.sparrowmailapp.com) [[alternative HTML version deleted]]
GO Category goseq GO Category goseq • 3.7k views
ADD COMMENT
0
Entering edit mode
@alicia-oshlack-4634
Last seen 9.6 years ago
Hi Gu Mi, I think your interpretation of over-represented and under-represented is correct. Over-represented means that there are more DE genes in the category than we would expect given the size of the category and the gene length distribution so that would be enriched for DE genes. Under-represented means that there are fewer DE genes in the category than we would expect by chance. The p-value relates to the probability of observing this number of DE genes in the category by chance. I hope that answers your question. Cheers, Alicia Date: Thu, 23 Jun 2011 23:01:13 -0700 From: Gu Mi <neo.migu@gmail.com> To: bioconductor at r-project.org Subject: [BioC] goseq - overrepresented p-values Message-ID: <f1ee3a8d74754df9b3fb75ea6a8ee106 at="" gmail.com=""> Content-Type: text/plain Dear All: I have a question about the bioconductor goseq (http://www.bioconductor.org/packages/release/bioc/html/goseq.html) package for GO enrichment analysis (taking length bias into consideration). Those top-ranked categories are obtained based on the ranking of "over_represented_pvalues" from the goseq object. The goseq also includes "under_represented_pvalues" from the same output. Can I know how the "over/under-representations" are determined? Why don't we consider "under-representation"? I am not sure if I can think of a category being "enriched" this way: if there are more DE genes for a particular category, then this category is "enriched" and the associated p-value is called "over- represented", while if there are fewer DE genes for a particular category, then this category is called "depleted" and "under- represented"? Can this be reflected in the sign (+/-) of certain quantities? I am new to this area, so thank you very much for your help! The vignette of the goseq package can be found here (http://www.bioconduct or.org/packages/2.8/bioc/vignettes/goseq/inst/doc/goseq.pdf) Thanks! Best, -- Gu Mi Sent with Sparrow (http://www.sparrowmailapp.com) ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________
ADD COMMENT

Login before adding your answer.

Traffic: 555 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6