ChIPpeakAnno venn diagram statistics
1
0
Entering edit mode
@ester-feldmesser-2270
Last seen 10.6 years ago
European Union
Hello, I would like to understand how the hypergeometric test is applied in the makeVennDiagram function, specifically what is the total, the sample and the success groups. Let's say we have two peak bed files with 3912 and 26009 peaks respectively and an overlap of 2577 peaks, how in this case should the test be applied? Thank you, Ester Feldmesser [[alternative HTML version deleted]]
• 1.9k views
ADD COMMENT
0
Entering edit mode
Noah Dowell ▴ 410
@noah-dowell-3791
Last seen 11.1 years ago
Hello Ester, Did you search the archives? I commented on your question extensively and Julie has also offered helpful insight and those messages are in the archives. Best, Noah On Dec 6, 2010, at 4:09 AM, Ester Feldmesser wrote: > Hello, > > I would like to understand how the hypergeometric test is applied in the makeVennDiagram function, specifically what is the total, the sample and the success groups. > > Let's say we have two peak bed files with 3912 and 26009 peaks respectively and an overlap of 2577 peaks, how in this case should the test be applied? > > Thank you, > > Ester Feldmesser > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hello Noah, I read the archives, but still there are some points that are not clear to me. 1. How is the hypergeometric test implemented, in other words if we use the phyper R function, <http: 127.0.0.1:26076="" library="" stats="" html="" hypergeometric.html="">what woud be p, m and k in the example given below. 2. Has somebody any additional idea how to calculate the totalTest when comparing between the two different transcription factor peaks? 3. Is there any other statistical test to calculate significance between overlaping peaks? Thanks, Esti Ester Feldmesser, Ph.D. Bioinformatics Unit, Department of Biological Services Weizmann Institute of Science Levine Building, Room 110 phone: +972-8-934-2614 email: ester.feldmesser@weizmann.ac.il He who thinketh he leadeth and hath no one following him is only taking a walk. Anonymous On 12/6/2010 9:16 PM, Noah Dowell wrote: > Hello Ester, > > Did you search the archives? I commented on your question extensively and Julie has also offered helpful insight and those messages are in the archives. > > Best, > > Noah > > > On Dec 6, 2010, at 4:09 AM, Ester Feldmesser wrote: > > >> Hello, >> >> I would like to understand how the hypergeometric test is applied in the makeVennDiagram function, specifically what is the total, the sample and the success groups. >> >> Let's say we have two peak bed files with 3912 and 26009 peaks respectively and an overlap of 2577 peaks, how in this case should the test be applied? >> >> Thank you, >> >> Ester Feldmesser >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hello Esti, Julie covered some points but I'll quickly chime in too. > 1. How is the hypergeometric test implemented, in other words if we use the phyper R function, what woud be p, m and k in the example given below. Simply printing the function (makeVennDiagram) to the screen will show you the implementation of the hypergeometric test. > 2. Has somebody any additional idea how to calculate the totalTest when comparing between the two different transcription factor peaks? I gave an extensive answer (see the archives) for both a sequence- dependent (transcription factor) and sequence-independent factor (histone) and how to estimate a range for totalTest. I think one way to estimate the upper limit for transcription factor binding is to count the number of DNA motifs for that factor in the genome. What are your thoughts? For two different factors I would count how often the two motifs co-occur and how often they are distinctly represented. This would require you to assume some distances for "co-occurance" for example within 1 kb or 5 kb or 0.5 kb... The possibilities are endless but describe your methods (and assumptions) clearly and the community may offer more insight... > 3. Is there any other statistical test to calculate significance between overlaping peaks? Like what? Some people like to do scatter plots and then use the ensuing correlation coefficients as a readout for "overlap." One word of caution though is to use the probes (from the array) that are called "bound" in both experiments when making the scatter plots. If you draw a scatter plot between all probes in two chip-chip experiments I feel the "unbound" (null distribution) probes drives the correlation and is not reflective of what is actually going on. Maybe others can comment on this... Best, Noah On Dec 6, 2010, at 11:42 PM, Ester Feldmesser wrote: > Hello Noah, > > I read the archives, but still there are some points that are not clear to me. > > 1. How is the hypergeometric test implemented, in other words if we use the phyper R function, what woud be p, m and k in the example given below. > > 2. Has somebody any additional idea how to calculate the totalTest when comparing between the two different transcription factor peaks? > > 3. Is there any other statistical test to calculate significance between overlaping peaks? > > Thanks, > > Esti > Ester Feldmesser, Ph.D. > Bioinformatics Unit, Department of Biological Services > Weizmann Institute of Science > Levine Building, Room 110 > phone: +972-8-934-2614 > email: ester.feldmesser@weizmann.ac.il > > He who thinketh he leadeth and hath no one following him is only taking a walk. > Anonymous > > > On 12/6/2010 9:16 PM, Noah Dowell wrote: >> >> Hello Ester, >> >> Did you search the archives? I commented on your question extensively and Julie has also offered helpful insight and those messages are in the archives. >> >> Best, >> >> Noah >> >> >> On Dec 6, 2010, at 4:09 AM, Ester Feldmesser wrote: >> >> >>> Hello, >>> >>> I would like to understand how the hypergeometric test is applied in the makeVennDiagram function, specifically what is the total, the sample and the success groups. >>> >>> Let's say we have two peak bed files with 3912 and 26009 peaks respectively and an overlap of 2577 peaks, how in this case should the test be applied? >>> >>> Thank you, >>> >>> Ester Feldmesser >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
I want to take this opportunity to thank Noah to share his insights and experience using the ChIPpeakAnno package. Ester, here is how the p-value is calculated for overlapping using your given example, phyper(2577-1, 3912, totalTest-3912, 26009, lower.tail = FALSE). Best regards, Julie On 12/7/10 2:42 AM, "Ester Feldmesser" <ester.feldmesser at="" weizmann.ac.il=""> wrote: > Hello Noah, > > I read the archives, but still there are some points that are not clear > to me. > > 1. How is the hypergeometric test implemented, in other words if we use > the phyper R function, > <http: 127.0.0.1:26076="" library="" stats="" html="" hypergeometric.html="">what woud > be p, m and k in the example given below. > > 2. Has somebody any additional idea how to calculate the totalTest when > comparing between the two different transcription factor peaks? > > 3. Is there any other statistical test to calculate significance between > overlaping peaks? > > Thanks, > > Esti > > Ester Feldmesser, Ph.D. > Bioinformatics Unit, Department of Biological Services > Weizmann Institute of Science > Levine Building, Room 110 > phone: +972-8-934-2614 > email: ester.feldmesser at weizmann.ac.il > > He who thinketh he leadeth and hath no one following him is only taking a > walk. > Anonymous > > > > On 12/6/2010 9:16 PM, Noah Dowell wrote: >> Hello Ester, >> >> Did you search the archives? I commented on your question extensively and >> Julie has also offered helpful insight and those messages are in the >> archives. >> >> Best, >> >> Noah >> >> >> On Dec 6, 2010, at 4:09 AM, Ester Feldmesser wrote: >> >> >>> Hello, >>> >>> I would like to understand how the hypergeometric test is applied in the >>> makeVennDiagram function, specifically what is the total, the sample and the >>> success groups. >>> >>> Let's say we have two peak bed files with 3912 and 26009 peaks respectively >>> and an overlap of 2577 peaks, how in this case should the test be applied? >>> >>> Thank you, >>> >>> Ester Feldmesser >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Thank you very much for your answers and your patience. Ester Feldmesser, Ph.D. Bioinformatics Unit, Department of Biological Services Weizmann Institute of Science Levine Building, Room 110 phone: +972-8-934-2614 email: ester.feldmesser at weizmann.ac.il He who thinketh he leadeth and hath no one following him is only taking a walk. Anonymous On 12/7/2010 11:42 PM, Zhu, Lihua (Julie) wrote: > I want to take this opportunity to thank Noah to share his insights and > experience using the ChIPpeakAnno package. > > Ester, here is how the p-value is calculated for overlapping using your > given example, phyper(2577-1, 3912, totalTest-3912, 26009, lower.tail = > FALSE). > > Best regards, > > Julie > > > On 12/7/10 2:42 AM, "Ester Feldmesser"<ester.feldmesser at="" weizmann.ac.il=""> > wrote: > > >> Hello Noah, >> >> I read the archives, but still there are some points that are not clear >> to me. >> >> 1. How is the hypergeometric test implemented, in other words if we use >> the phyper R function, >> <http: 127.0.0.1:26076="" library="" stats="" html="" hypergeometric.html="">what woud >> be p, m and k in the example given below. >> >> 2. Has somebody any additional idea how to calculate the totalTest when >> comparing between the two different transcription factor peaks? >> >> 3. Is there any other statistical test to calculate significance between >> overlaping peaks? >> >> Thanks, >> >> Esti >> >> Ester Feldmesser, Ph.D. >> Bioinformatics Unit, Department of Biological Services >> Weizmann Institute of Science >> Levine Building, Room 110 >> phone: +972-8-934-2614 >> email: ester.feldmesser at weizmann.ac.il >> >> He who thinketh he leadeth and hath no one following him is only taking a >> walk. >> Anonymous >> >> >> >> On 12/6/2010 9:16 PM, Noah Dowell wrote: >> >>> Hello Ester, >>> >>> Did you search the archives? I commented on your question extensively and >>> Julie has also offered helpful insight and those messages are in the >>> archives. >>> >>> Best, >>> >>> Noah >>> >>> >>> On Dec 6, 2010, at 4:09 AM, Ester Feldmesser wrote: >>> >>> >>> >>>> Hello, >>>> >>>> I would like to understand how the hypergeometric test is applied in the >>>> makeVennDiagram function, specifically what is the total, the sample and the >>>> success groups. >>>> >>>> Let's say we have two peak bed files with 3912 and 26009 peaks respectively >>>> and an overlap of 2577 peaks, how in this case should the test be applied? >>>> >>>> Thank you, >>>> >>>> Ester Feldmesser >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> >>> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > >
ADD REPLY
0
Entering edit mode
Thank you very much for your help. I have several thoughts regarding the overlap between peaks in chIP- seq analyses: 1. Could I calculate the p-value also in the following way for my example? phyper(2577-1, 3912, totalTest-3912, 26009, lower.tail =FALSE) Since the results are not symmetric and the experiments have equal weight according to my understanding, I would not be sure what is the right way to apply the test. > phyper(2577-1, 3912, 30000-3912, 26009, lower.tail =FALSE) [1] 1 > phyper(2577-1, 30000-26009,26009, 3912, lower.tail =FALSE) [1] 0 2. Regarding the totalTest, I agree that probably taking only the peaks we see in the two experiments is an underestimation. On the other hand, counting the number of DNA motifs for that factor in the genome may give a too high number because some of the motifs are probably not functional and appear in the genome by chance. I admit that it is easier criticizing than founding a solution and I have not found a solution I am happy with. Any ideas or comments will be highly appreciated. Esti Ester Feldmesser, Ph.D. Bioinformatics Unit, Department of Biological Services Weizmann Institute of Science Levine Building, Room 110 phone: +972-8-934-2614 email: ester.feldmesser at weizmann.ac.il He who thinketh he leadeth and hath no one following him is only taking a walk. Anonymous On 12/7/2010 11:42 PM, Zhu, Lihua (Julie) wrote: > I want to take this opportunity to thank Noah to share his insights and > experience using the ChIPpeakAnno package. > > Ester, here is how the p-value is calculated for overlapping using your > given example, phyper(2577-1, 3912, totalTest-3912, 26009, lower.tail = > FALSE). > > Best regards, > > Julie > > > On 12/7/10 2:42 AM, "Ester Feldmesser"<ester.feldmesser at="" weizmann.ac.il=""> > wrote: > > >> Hello Noah, >> >> I read the archives, but still there are some points that are not clear >> to me. >> >> 1. How is the hypergeometric test implemented, in other words if we use >> the phyper R function, >> <http: 127.0.0.1:26076="" library="" stats="" html="" hypergeometric.html="">what woud >> be p, m and k in the example given below. >> >> 2. Has somebody any additional idea how to calculate the totalTest when >> comparing between the two different transcription factor peaks? >> >> 3. Is there any other statistical test to calculate significance between >> overlaping peaks? >> >> Thanks, >> >> Esti >> >> Ester Feldmesser, Ph.D. >> Bioinformatics Unit, Department of Biological Services >> Weizmann Institute of Science >> Levine Building, Room 110 >> phone: +972-8-934-2614 >> email: ester.feldmesser at weizmann.ac.il >> >> He who thinketh he leadeth and hath no one following him is only taking a >> walk. >> Anonymous >> >> >> >> On 12/6/2010 9:16 PM, Noah Dowell wrote: >> >>> Hello Ester, >>> >>> Did you search the archives? I commented on your question extensively and >>> Julie has also offered helpful insight and those messages are in the >>> archives. >>> >>> Best, >>> >>> Noah >>> >>> >>> On Dec 6, 2010, at 4:09 AM, Ester Feldmesser wrote: >>> >>> >>> >>>> Hello, >>>> >>>> I would like to understand how the hypergeometric test is applied in the >>>> makeVennDiagram function, specifically what is the total, the sample and the >>>> success groups. >>>> >>>> Let's say we have two peak bed files with 3912 and 26009 peaks respectively >>>> and an overlap of 2577 peaks, how in this case should the test be applied? >>>> >>>> Thank you, >>>> >>>> Ester Feldmesser >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> >>> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > >
ADD REPLY

Login before adding your answer.

Traffic: 1216 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6