GSEA, HyperGeo vs gene-centric analysis

0

Entering edit mode

arne.mueller@novartis.com ▴ 200

@arnemuellernovartiscom-2205

Last seen 10.3 years ago

Switzerland

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20070703/ 7904855b/attachment.pl

• 945 views

ADD COMMENT • link updated 18.6 years ago by Weiwei Shi ★ 1.2k • written 18.6 years ago by arne.mueller@novartis.com ▴ 200

0

Entering edit mode

Weiwei Shi ★ 1.2k

@weiwei-shi-1407

Last seen 11.4 years ago

Hi, Arne: I am a research scientist in GeneGo, Inc and most of my work focuses on classification of microarray data from a perspective of pathway or gene set. The shortest answer to your question, yes, gene set analysis instead of gene analysis helps classification; but answer is easy, convincing is way hard. In the situation of classificaiton using limited-sample-size of microarray experiments, the result of validating "pathway approach is better than gene one" is always not significant, and thus, accuracy should not be the only criterion. As you mentioned, pathway analysis is better than gene analysis b/c it provides better understanding of biological mechanism; which should be also included in consideration when you want to know if it helps or not. B/c if you find the right mechanism (of course, how to defined "right" based on if you can repeat, further discussion becomes of philosophy question instead), therotically it gurantees your "robustness", while most of time, robustness relies on sample size you have. The more samples, the more you can be convinced. Therefore, pathway analysis can be considered as one of possible solutions to the limited sample size issue, IMO. And, this is just some cents from my experience and research. HTH, Weiwei On 7/3/07, arne.mueller at novartis.com <arne.mueller at="" novartis.com=""> wrote: > Hello, > > first of all, I have to apologize for this slight off topic posting ... - > but I could not find a better place to ask this question ;-), well, it's > not really a question but more of a brainstorming. > > I was (unsuccessfully) looking for some literature that reports the > comparison of classical gene-centric versus gene set enrichment or > over-representation analysis (biological functions such as in GO, KEGG etc > ...). There are plenty of papers that describe specific tools and methods > for finding potentially impacted biological functions in microarray > experiments. However, I so far I couldn't find any literature that > rigorously compares the classification performance (e.g. via some standard > machine learning method) of lets say treated and healthy sample groups > based on a matrix of gene expression values or a matrix of gene sets > (either all gene sets that one think of or a subset of that was considered > important by a certain method such as GSEA according to a cut-off). > Obviously there'd be many variables to be explored, and factors that could > impact such a comparison (classification method, number of variables, > array pro/post processing, choice of gene set database etc ... ). > > I don't have any doubts that gene set analysis helps understanding the > effects of some treatment in a microarray experiment (I use it for looking > at my data), but does it help in automated sample classification? Can a > small subset of "biological functions" (grouped in a few gene sets) better > discriminate between two sample groups than all genes (or a subset) on the > chip (I assume that it is unlikely to choose exactly the same subset of > genes in a gene-centric analysis than in a gene set centric approach)? > > Well, of course this depends on the types of gene sets defines, and we are > aiming to reduce the data complexity as much as possible without loosing > important information and to reduce noise. > > What'd be the most rigorous and formal analysis that has been published on > this (one that does not necessarily has a focus on selling a specific > method)? > > > thanks a lot for your input, > +kind regards, > > Arne > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III

ADD COMMENT • link 18.6 years ago Weiwei Shi ★ 1.2k

Login before adding your answer.