Hi, Arne:
I am a research scientist in GeneGo, Inc and most of my work focuses
on classification of microarray data from a perspective of pathway or
gene set. The shortest answer to your question, yes, gene set analysis
instead of gene analysis helps classification; but answer is easy,
convincing is way hard. In the situation of classificaiton using
limited-sample-size of microarray experiments, the result of
validating "pathway approach is better than gene one" is always not
significant, and thus, accuracy should not be the only criterion. As
you mentioned, pathway analysis is better than gene analysis b/c it
provides better understanding of biological mechanism; which should be
also included in consideration when you want to know if it helps or
not. B/c if you find the right mechanism (of course, how to defined
"right" based on if you can repeat, further discussion becomes of
philosophy question instead), therotically it gurantees your
"robustness", while most of time, robustness relies on sample size you
have. The more samples, the more you can be convinced. Therefore,
pathway analysis can be considered as one of possible solutions to the
limited sample size issue, IMO.
And, this is just some cents from my experience and research.
HTH,
Weiwei
On 7/3/07, arne.mueller at novartis.com <arne.mueller at="" novartis.com="">
wrote:
> Hello,
>
> first of all, I have to apologize for this slight off topic posting
... -
> but I could not find a better place to ask this question ;-), well,
it's
> not really a question but more of a brainstorming.
>
> I was (unsuccessfully) looking for some literature that reports the
> comparison of classical gene-centric versus gene set enrichment or
> over-representation analysis (biological functions such as in GO,
KEGG etc
> ...). There are plenty of papers that describe specific tools and
methods
> for finding potentially impacted biological functions in microarray
> experiments. However, I so far I couldn't find any literature that
> rigorously compares the classification performance (e.g. via some
standard
> machine learning method) of lets say treated and healthy sample
groups
> based on a matrix of gene expression values or a matrix of gene sets
> (either all gene sets that one think of or a subset of that was
considered
> important by a certain method such as GSEA according to a cut-off).
> Obviously there'd be many variables to be explored, and factors that
could
> impact such a comparison (classification method, number of
variables,
> array pro/post processing, choice of gene set database etc ... ).
>
> I don't have any doubts that gene set analysis helps understanding
the
> effects of some treatment in a microarray experiment (I use it for
looking
> at my data), but does it help in automated sample classification?
Can a
> small subset of "biological functions" (grouped in a few gene sets)
better
> discriminate between two sample groups than all genes (or a subset)
on the
> chip (I assume that it is unlikely to choose exactly the same subset
of
> genes in a gene-centric analysis than in a gene set centric
approach)?
>
> Well, of course this depends on the types of gene sets defines, and
we are
> aiming to reduce the data complexity as much as possible without
loosing
> important information and to reduce noise.
>
> What'd be the most rigorous and formal analysis that has been
published on
> this (one that does not necessarily has a focus on selling a
specific
> method)?
>
>
> thanks a lot for your input,
> +kind regards,
>
> Arne
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III