Question

Re: GoHyperG

0

Entering edit mode

Nicholas Lewin-Koh ▴ 430

@nicholas-lewin-koh-63

Last seen 9.6 years ago

Hi Sean, In this situation I would hope it is a one sided test. I had this same discussion with a colleague who wanted the same thing. I don't think testing for under-representation means anything. Think about the context, one is doing recursive sampling of a finite of a finite population for which there are two sources of bias, what is represented in the database or on the chip, and what is annotated on the chip. Further you are testing at each node the discrepency from random, as you go down the DAG zero becomes more and more probable, you can think of it as doing a mark-recapture study on your genes. This problem is exacerbated by the sampling bias. Finally, a last complication is that test is further biased by your ability to detect differentially expressed genes. At least if you detect over-representation you can argue for a strong signal. Nicholas > > Message: 4 > Date: Wed, 22 Dec 2004 11:02:55 -0500 > From: Sean Davis <sdavis2@mail.nih.gov> > Subject: [BioC] GoHyperG > To: Bioconductor <bioconductor@stat.math.ethz.ch> > Message-ID: <f0ee8e4b-5432-11d9-accb-000d933565e8@mail.nih.gov> > Content-Type: text/plain; charset=US-ASCII; format=flowed > > Just a quick question--are the p-values from gohyperg one- or > two-sided? I have a collaborator who would like to use it to determine > underrepresented ontology categories. > > Thanks, > Sean > > >

GO GO • 823 views

ADD COMMENT • link updated 19.3 years ago by Sean Davis 21k • written 19.3 years ago by Nicholas Lewin-Koh ▴ 430

score 0 · Answer 1 · 2004-12-23

On Dec 23, 2004, at 10:52 AM, Nicholas Lewin-Koh wrote: > Hi Sean, > In this situation I would hope it is a one sided test. I had this > same discussion with a colleague who wanted the same thing. I don't > think testing for under-representation means anything. Think about > the context, one is doing recursive sampling of a finite of a finite > population for which there are two sources of bias, what is represented > in the database or on the chip, and what is annotated on the chip. > Further you are testing at each node the discrepency from random, > as you go down the DAG zero becomes more and more probable, you can > think > of it as doing a mark-recapture study on your genes. This problem is > exacerbated > by the sampling bias. Finally, a last complication is that test is > further biased by your ability to detect differentially expressed > genes. > At least if you detect over-representation you can argue for a strong > signal. I'm being a bit dense, but suppose I have 10000 genes on a chip (annotated in ontology Y), 1000 of which are annotated as category X; I find 1000 differentially-expressed genes (annotated in ontology Y) from that chip, but only 12 are from category X. Is that not interesting to know about? As for finding zeros, as it becomes more probable as one moves down the DAG, of course finding "underrepresented" groups becomes prohibitively difficult, but for large categories is certainly possible. As for biases, I'm not sure that I agree that ability to detect differentially-expressed genes is a source of "bias". It is certainly a limitation, but I don't think a bias. And I'm not sure what "sampling bias" might be present? Thanks for the food for thought. Sean >> >> Message: 4 >> Date: Wed, 22 Dec 2004 11:02:55 -0500 >> From: Sean Davis <sdavis2@mail.nih.gov> >> Subject: [BioC] GoHyperG >> To: Bioconductor <bioconductor@stat.math.ethz.ch> >> Message-ID: <f0ee8e4b-5432-11d9-accb-000d933565e8@mail.nih.gov> >> Content-Type: text/plain; charset=US-ASCII; format=flowed >> >> Just a quick question--are the p-values from gohyperg one- or >> two-sided? I have a collaborator who would like to use it to >> determine >> underrepresented ontology categories. >> >> Thanks, >> Sean >> >> >>