Hi, Francois and other listers:
Thank you for the detailed reply. Actually, I read those papers on GO
enrichment analysis or Gene Set one. There are basically two
approaches in stat: baysian or frequentist. The latter could use
hypergeometric or t test to derive some p-values. Currently I am using
BayGO (implemented in R) which is based on the baysian inference and
have some interesting results on a dataset about psoriasis.
My initial question is about
how to automatic "validate" or "test" the result I get from whatever
methods i use, like text mining or something like that.
But you mentioned that "The basic way to do this would be to use
an hypergeometric test (often used in the case of GO), although it can
be tricky to get right and has a few other issues.", which reminds of
another question on it:
how do u define the "success events" in hypergeometric test? and how
do you make sure the sampling has no bias when you pick genes in your
study?
I will go to find by myself but maybe someone here would like to give
me some suggestions too.
As to the pathway, I am using GeneGO's internal Metabase.
Thank you,
Weiwei
On 8/25/06, Francois Pepin <fpepin at="" aei.ca=""> wrote:
> Hi Weiwei,
>
> If you want to know if a given set of genes (ie members of the
pathway)
> are behaving differently in a given set of arrays (ie your disease
> samples), there are a few ways. The basic way to do this would be to
use
> an hypergeometric test (often used in the case of GO), although it
can
> be tricky to get right and has a few other issues.
>
> There are other methods, such as the Gene Set Enrichment method in
the
> Category package, that combine a set of t-tests together. Other
packages
> like safe and sigPathway have different methods of doing the same
thing.
> There was a discussion on this recently on the mailing list, you
would
> probably want to look over it.
>
> As far as I can tell, all of those methods require that you have
your
> pathway already defined. Some databases like KEGG or BioCarta have
> pathway definitions, but they're don't cover all pathways and few,
if
> any, are up-to-date with the literature.
>
> If we really care about a given pathway, we'll go and create our own
> list ourselves from the database. It is important in such a case to
> create the list before you've started looking at the differentially
> expressed genes, because you would be biasing your results. Of
course,
> it is always good to be able to explain your results a biologically
> afterward, but this is not the same as showing a statistically
> significant correlation with a pathway.
>
> Hope this helps,
>
> Francois
>
> On Thu, 2006-08-24 at 18:57 -0400, Weiwei Shi wrote:
> > Dear Listers:
> >
> > I have a question originated from pathway analysis:
> >
> > Suppose i have found a pathway which strongly associates with a
> > disease from pathway analysis; my question is on how to validate
this
> > rule? I mean, is there any tool doing some automatic association
> > analysis with scientific record like PubMed and it can give some
> > evaluation on the strength of such association.
> >
> > thanks.
> >
>
>
--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III