"automatic association analysis"
3
0
Entering edit mode
Weiwei Shi ★ 1.2k
@weiwei-shi-1407
Last seen 10.2 years ago
Dear Listers: I have a question originated from pathway analysis: Suppose i have found a pathway which strongly associates with a disease from pathway analysis; my question is on how to validate this rule? I mean, is there any tool doing some automatic association analysis with scientific record like PubMed and it can give some evaluation on the strength of such association. thanks. -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III
• 1.2k views
ADD COMMENT
0
Entering edit mode
@francois-pepin-1163
Last seen 10.2 years ago
Hi Weiwei, If you want to know if a given set of genes (ie members of the pathway) are behaving differently in a given set of arrays (ie your disease samples), there are a few ways. The basic way to do this would be to use an hypergeometric test (often used in the case of GO), although it can be tricky to get right and has a few other issues. There are other methods, such as the Gene Set Enrichment method in the Category package, that combine a set of t-tests together. Other packages like safe and sigPathway have different methods of doing the same thing. There was a discussion on this recently on the mailing list, you would probably want to look over it. As far as I can tell, all of those methods require that you have your pathway already defined. Some databases like KEGG or BioCarta have pathway definitions, but they're don't cover all pathways and few, if any, are up-to-date with the literature. If we really care about a given pathway, we'll go and create our own list ourselves from the database. It is important in such a case to create the list before you've started looking at the differentially expressed genes, because you would be biasing your results. Of course, it is always good to be able to explain your results a biologically afterward, but this is not the same as showing a statistically significant correlation with a pathway. Hope this helps, Francois On Thu, 2006-08-24 at 18:57 -0400, Weiwei Shi wrote: > Dear Listers: > > I have a question originated from pathway analysis: > > Suppose i have found a pathway which strongly associates with a > disease from pathway analysis; my question is on how to validate this > rule? I mean, is there any tool doing some automatic association > analysis with scientific record like PubMed and it can give some > evaluation on the strength of such association. > > thanks. >
ADD COMMENT
0
Entering edit mode
Francois Pepin ★ 1.3k
@francois-pepin-1012
Last seen 10.2 years ago
Hi Weiwei, If you want to know if a given set of genes (ie members of the pathway) are behaving differently in a given set of arrays (ie your disease samples), there are a few ways. The basic way to do this would be to use an hypergeometric test (often used in the case of GO), although it can be tricky to get right and has a few other issues. There are other methods, such as the Gene Set Enrichment method in the Category package, that combine a set of t-tests together. Other packages like safe and sigPathway have different methods of doing the same thing. There was a discussion on this recently on the mailing list, you would probably want to look over it. As far as I can tell, all of those methods require that you have your pathway already defined. Some databases like KEGG or BioCarta have pathway definitions, but they're don't cover all pathways and few, if any, are up-to-date with the literature. If we really care about a given pathway, we'll go and create our own list ourselves from the database. It is important in such a case to create the list before you've started looking at the differentially expressed genes, because you would be biasing your results. Of course, it is always good to be able to explain your results a biologically afterward, but this is not the same as showing a statistically significant correlation with a pathway. Hope this helps, Francois On Thu, 2006-08-24 at 18:57 -0400, Weiwei Shi wrote: > Dear Listers: > > I have a question originated from pathway analysis: > > Suppose i have found a pathway which strongly associates with a > disease from pathway analysis; my question is on how to validate this > rule? I mean, is there any tool doing some automatic association > analysis with scientific record like PubMed and it can give some > evaluation on the strength of such association. > > thanks. >
ADD COMMENT
0
Entering edit mode
Weiwei Shi ★ 1.2k
@weiwei-shi-1407
Last seen 10.2 years ago
Hi, Francois and other listers: Thank you for the detailed reply. Actually, I read those papers on GO enrichment analysis or Gene Set one. There are basically two approaches in stat: baysian or frequentist. The latter could use hypergeometric or t test to derive some p-values. Currently I am using BayGO (implemented in R) which is based on the baysian inference and have some interesting results on a dataset about psoriasis. My initial question is about how to automatic "validate" or "test" the result I get from whatever methods i use, like text mining or something like that. But you mentioned that "The basic way to do this would be to use an hypergeometric test (often used in the case of GO), although it can be tricky to get right and has a few other issues.", which reminds of another question on it: how do u define the "success events" in hypergeometric test? and how do you make sure the sampling has no bias when you pick genes in your study? I will go to find by myself but maybe someone here would like to give me some suggestions too. As to the pathway, I am using GeneGO's internal Metabase. Thank you, Weiwei On 8/25/06, Francois Pepin <fpepin at="" aei.ca=""> wrote: > Hi Weiwei, > > If you want to know if a given set of genes (ie members of the pathway) > are behaving differently in a given set of arrays (ie your disease > samples), there are a few ways. The basic way to do this would be to use > an hypergeometric test (often used in the case of GO), although it can > be tricky to get right and has a few other issues. > > There are other methods, such as the Gene Set Enrichment method in the > Category package, that combine a set of t-tests together. Other packages > like safe and sigPathway have different methods of doing the same thing. > There was a discussion on this recently on the mailing list, you would > probably want to look over it. > > As far as I can tell, all of those methods require that you have your > pathway already defined. Some databases like KEGG or BioCarta have > pathway definitions, but they're don't cover all pathways and few, if > any, are up-to-date with the literature. > > If we really care about a given pathway, we'll go and create our own > list ourselves from the database. It is important in such a case to > create the list before you've started looking at the differentially > expressed genes, because you would be biasing your results. Of course, > it is always good to be able to explain your results a biologically > afterward, but this is not the same as showing a statistically > significant correlation with a pathway. > > Hope this helps, > > Francois > > On Thu, 2006-08-24 at 18:57 -0400, Weiwei Shi wrote: > > Dear Listers: > > > > I have a question originated from pathway analysis: > > > > Suppose i have found a pathway which strongly associates with a > > disease from pathway analysis; my question is on how to validate this > > rule? I mean, is there any tool doing some automatic association > > analysis with scientific record like PubMed and it can give some > > evaluation on the strength of such association. > > > > thanks. > > > > -- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III
ADD COMMENT
0
Entering edit mode
Hi Weiwei > My initial question is about > how to automatic "validate" or "test" the result I get from whatever > methods i use, like text mining or something like that. I think some packages may exist, but we do that by hand. Once we're pointed to a specific pathway, we prefer to let humans handle the rest. > how do u define the "success events" in hypergeometric test? and how > do you make sure the sampling has no bias when you pick genes in your > study? That's one of the tricky issues. People usually use differentially expressed genes, but putting a threshold there isn't obvious. One of the reasons some people do not like it (and I'm starting to feel the same way) is that the values are very continuous such that changing the threshold by a hair changes your set of genes (often changing your results significantly. I'm not sure what you mean about the sampling bias. If you filter in an unbiased way and set your universe to be what is available on the chip you should be ok. You should also deal with duplicate probes (if any) and duplicate probes per genes (if any). Again the archives have a couple of fairly detailed discussions on those issues. Francois
ADD REPLY
0
Entering edit mode
Hi Weiwei and Francois, If my understanding is correct, you worried about false positive results, don't you. If that is the case we usually use Benjamili & Hochberg fdr to correct raw p-values which have been obtained with hypergeometirc test for GO analysis. We do that manually in R/Bioconductor or even in Microsoft Excel. cheers! Jiaping --On Friday, August 25, 2006 2:09 PM -0400 Francois Pepin <fpepin at="" cs.mcgill.ca=""> wrote: > Hi Weiwei > >> My initial question is about >> how to automatic "validate" or "test" the result I get from whatever >> methods i use, like text mining or something like that. > > I think some packages may exist, but we do that by hand. Once we're > pointed to a specific pathway, we prefer to let humans handle the rest. > >> how do u define the "success events" in hypergeometric test? and how >> do you make sure the sampling has no bias when you pick genes in your >> study? > > That's one of the tricky issues. People usually use differentially > expressed genes, but putting a threshold there isn't obvious. One of the > reasons some people do not like it (and I'm starting to feel the same > way) is that the values are very continuous such that changing the > threshold by a hair changes your set of genes (often changing your > results significantly. > > I'm not sure what you mean about the sampling bias. If you filter in an > unbiased way and set your universe to be what is available on the chip > you should be ok. You should also deal with duplicate probes (if any) > and duplicate probes per genes (if any). Again the archives have a > couple of fairly detailed discussions on those issues. > > Francois > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor ################################## Jianping Jin Ph.D. Bioinformatics scientist Center for Bioinformatics Room 3133 Bioinformatics building CB# 7104 University of Chapel Hill Chapel Hill, NC 27599 Phone: (919)843-6105 FAX: (919)843-3103 E-Mail: jjin at email.unc.edu
ADD REPLY

Login before adding your answer.

Traffic: 581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6