Re: GoHyperG
1
0
Entering edit mode
@nicholas-lewin-koh-63
Last seen 9.6 years ago
Hi Sean, In this situation I would hope it is a one sided test. I had this same discussion with a colleague who wanted the same thing. I don't think testing for under-representation means anything. Think about the context, one is doing recursive sampling of a finite of a finite population for which there are two sources of bias, what is represented in the database or on the chip, and what is annotated on the chip. Further you are testing at each node the discrepency from random, as you go down the DAG zero becomes more and more probable, you can think of it as doing a mark-recapture study on your genes. This problem is exacerbated by the sampling bias. Finally, a last complication is that test is further biased by your ability to detect differentially expressed genes. At least if you detect over-representation you can argue for a strong signal. Nicholas > > Message: 4 > Date: Wed, 22 Dec 2004 11:02:55 -0500 > From: Sean Davis <sdavis2@mail.nih.gov> > Subject: [BioC] GoHyperG > To: Bioconductor <bioconductor@stat.math.ethz.ch> > Message-ID: <f0ee8e4b-5432-11d9-accb-000d933565e8@mail.nih.gov> > Content-Type: text/plain; charset=US-ASCII; format=flowed > > Just a quick question--are the p-values from gohyperg one- or > two-sided? I have a collaborator who would like to use it to determine > underrepresented ontology categories. > > Thanks, > Sean > > >
GO GO • 823 views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 3 months ago
United States
On Dec 23, 2004, at 10:52 AM, Nicholas Lewin-Koh wrote: > Hi Sean, > In this situation I would hope it is a one sided test. I had this > same discussion with a colleague who wanted the same thing. I don't > think testing for under-representation means anything. Think about > the context, one is doing recursive sampling of a finite of a finite > population for which there are two sources of bias, what is represented > in the database or on the chip, and what is annotated on the chip. > Further you are testing at each node the discrepency from random, > as you go down the DAG zero becomes more and more probable, you can > think > of it as doing a mark-recapture study on your genes. This problem is > exacerbated > by the sampling bias. Finally, a last complication is that test is > further biased by your ability to detect differentially expressed > genes. > At least if you detect over-representation you can argue for a strong > signal. I'm being a bit dense, but suppose I have 10000 genes on a chip (annotated in ontology Y), 1000 of which are annotated as category X; I find 1000 differentially-expressed genes (annotated in ontology Y) from that chip, but only 12 are from category X. Is that not interesting to know about? As for finding zeros, as it becomes more probable as one moves down the DAG, of course finding "underrepresented" groups becomes prohibitively difficult, but for large categories is certainly possible. As for biases, I'm not sure that I agree that ability to detect differentially-expressed genes is a source of "bias". It is certainly a limitation, but I don't think a bias. And I'm not sure what "sampling bias" might be present? Thanks for the food for thought. Sean >> >> Message: 4 >> Date: Wed, 22 Dec 2004 11:02:55 -0500 >> From: Sean Davis <sdavis2@mail.nih.gov> >> Subject: [BioC] GoHyperG >> To: Bioconductor <bioconductor@stat.math.ethz.ch> >> Message-ID: <f0ee8e4b-5432-11d9-accb-000d933565e8@mail.nih.gov> >> Content-Type: text/plain; charset=US-ASCII; format=flowed >> >> Just a quick question--are the p-values from gohyperg one- or >> two-sided? I have a collaborator who would like to use it to >> determine >> underrepresented ontology categories. >> >> Thanks, >> Sean >> >> >>
ADD COMMENT
0
Entering edit mode
Hi Sean, My answer is below, On Thu, 23 Dec 2004 11:29:45 -0500, "Sean Davis" <sdavis2@mail.nih.gov> said: > > On Dec 23, 2004, at 10:52 AM, Nicholas Lewin-Koh wrote: > > > Hi Sean, > > In this situation I would hope it is a one sided test. I had this > > same discussion with a colleague who wanted the same thing. I don't > > think testing for under-representation means anything. Think about > > the context, one is doing recursive sampling of a finite of a finite > > population for which there are two sources of bias, what is represented > > in the database or on the chip, and what is annotated on the chip. > > Further you are testing at each node the discrepency from random, > > as you go down the DAG zero becomes more and more probable, you can > > think > > of it as doing a mark-recapture study on your genes. This problem is > > exacerbated > > by the sampling bias. Finally, a last complication is that test is > > further biased by your ability to detect differentially expressed > > genes. > > At least if you detect over-representation you can argue for a strong > > signal. > > I'm being a bit dense, but suppose I have 10000 genes on a chip > (annotated in ontology Y), 1000 of which are annotated as category X; I > find 1000 differentially-expressed genes (annotated in ontology Y) from > that chip, but only 12 are from category X. Is that not interesting to > know about? I'd say probably not. More likely the 12 genes represent over-representation somewhere down the DAG, or they are due to genes that overlap categories and are part of another set of groups that is expressed. If you do detect under representation how would you interpret it? I don't see how there would be a biological interpretation (mind you I am not a biologist) unless you had a distinct hypothesis about a group that should be expressed under the treatment, in which case this is probably the wrong approach and something like Jelle Goeman's global test would be much more appropriate. > > As for finding zeros, as it becomes more probable as one moves down the > DAG, of course finding "underrepresented" groups becomes prohibitively > difficult, but for large categories is certainly possible. As for > biases, I'm not sure that I agree that ability to detect > differentially-expressed genes is a source of "bias". It is certainly > a limitation, but I don't think a bias. And I'm not sure what > "sampling bias" might be present? Look at the parameters in the hypergeometric. The idea behind the hypergeometric is sampling from a finite population. We have a finite population N but, N is conditional on the probes being annotated and represented on the chip. So from that perspective we are conditionally unbiased. But at each level of refinement in go we can expect that annotation will be more variable, so we are "losing" genes as the functions become more refined. It is like dropping marbles through leaky pipes and trying to estimate the total by what drops through at the bottom. Anyway I'm drinking eggnog as I write so I may not be making as much sense as I think I am. A merry christmas to you. Nicholas > > Thanks for the food for thought. > > Sean > > > > >> > >> Message: 4 > >> Date: Wed, 22 Dec 2004 11:02:55 -0500 > >> From: Sean Davis <sdavis2@mail.nih.gov> > >> Subject: [BioC] GoHyperG > >> To: Bioconductor <bioconductor@stat.math.ethz.ch> > >> Message-ID: <f0ee8e4b-5432-11d9-accb-000d933565e8@mail.nih.gov> > >> Content-Type: text/plain; charset=US-ASCII; format=flowed > >> > >> Just a quick question--are the p-values from gohyperg one- or > >> two-sided? I have a collaborator who would like to use it to > >> determine > >> underrepresented ontology categories. > >> > >> Thanks, > >> Sean > >> > >> > >> >
ADD REPLY

Login before adding your answer.

Traffic: 534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6