Question: MEDIPS vs MACS: why does MEDIPS generate more conservative p-values
0
gravatar for tptacek3050
3.9 years ago by
United States
tptacek30500 wrote:

I'm working on getting MEDIPS running for some collaborators who previously had been using MACS to analyze their methyl cap seq data.

 

I've run MEDIPS on a data set that they had previously run through MACS. I only got one solid hit (p<0.05 after correction for multiple comparisons) using MEDIPS, while MACS yielded ~200. The single positive hit from MEDIPS was replicated in MACS, and the MACS data were corrected for multiple comparisons using the same method (BH).

 

After looking at the data, it appears that the issue is that MEDIPS is wiping out all of the p-values when adjusting for multiple comparisons. I spot-checked some of the hits from MACS, and for every hit in MACS, I see an interval or two that have a significant (<0.05) p-value in MEDIPS that gets adjusted to 1 or so. I've tried a few methods to "fix" this: I've tried increasing the window size (idea: larger window = less intervals = less comparisons) and changing the correction method (tried fdr, as it isn't overly conservative in my experience). None of this changed the outcome (i.e. single hit), although the exact corrected p-value did fluctuate.

 

Can anyone with a better understanding of the MEDIPS and MACS algorithms explain these differences?

medips p-value macs • 1.1k views
ADD COMMENTlink modified 3.9 years ago by Lukas Chavez510 • written 3.9 years ago by tptacek30500
Answer: MEDIPS vs MACS: why does MEDIPS generate more conservative p-values
1
gravatar for Lukas Chavez
3.9 years ago by
Lukas Chavez510
USA/La Jolla/UCSD
Lukas Chavez510 wrote:
Dear tptacek3050, thank you for your detailed comparison of MACS and MEDIPS! MEDIPS was not designed to identify enriched DNA IP-seq enriched genomic regions over Input (“peaks”), because there exist great tools to accomplish this (like e.g. MACS). Instead, MEDIPS aims to identify differentially DNA IP-seq enriched genomic regions comparing two different conditions by considering technical and biological variation across replicates (enabled by edgeR). In case of methylation specific DNA-IP seq assays, like MeDIP-seq, MEDIPS is capable of transforming DNA-IP seq data into methylation values by normalising for local CpG densities. However, differential DNA IP-seq coverage (or differential methylation in case of methylation specific assays) between conditions will be calculated based on the actual read counts without any CpG density normalisation (please also consider section 6.7 (Comments on the experimental design and Input data) of the latest MEDIPS vignette. However, what you are encountering is a problem of multiple testing due to the vast amount of small genomic windows tested by MEDIPS for differential coverage. This is especially problematic when the number of replicates per group is small. To approach this problem you could increase the minRowSum parameter of the MEDIPS.meth() function (default=10 in MEDIPS v. 1.18.0) what will remove a vast amount genomic regions with no or low coverage what will reduce the total number of applied tests. In fact, we are currently working on an optimised way to exclude genomic windows with low coverage before testing differential coverage between conditions. Alternatively, you could only test predefined regions of interest by using MEDIPS.createROIset() instead of MEDIPS.createSet(). Using MEDIPS.createROIset() you can specify any set of genomic regions (e.g. peaks, CGIs, promoters etc.) at the parameter ROI. By the way, the parameter bn enables binning of the ROIs into smaller bins, if desired. Please see also the example in ?MEDIPS.createROIset. All the best, Lukas On 09 Jun 2015, at 17:08, tptacek3050 [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""/> User tptacek3050<https: support.bioconductor.org="" u="" 7477=""/> wrote Question: MEDIPS vs MACS: why does MEDIPS generate more conservative p-values<https: support.bioconductor.org="" p="" 68549=""/>: I'm working on getting MEDIPS running for some collaborators who previously had been using MACS to analyze their methyl cap seq data. I've run MEDIPS on a data set that they had previously run through MACS. I only got one solid hit (p<0.05 after correction for multiple comparisons) using MEDIPS, while MACS yielded ~200. The single positive hit from MEDIPS was replicated in MACS, and the MACS data were corrected for multiple comparisons using the same method (BH). After looking at the data, it appears that the issue is that MEDIPS is wiping out all of the p-values when adjusting for multiple comparisons. I spot-checked some of the hits from MACS, and for every hit in MACS, I see an interval or two that have a significant (<0.05) p-value in MEDIPS that gets adjusted to 1 or so. I've tried a few methods to "fix" this: I've tried increasing the window size (idea: larger window = less intervals = less comparisons) and changing the correction method (tried fdr, as it isn't overly conservative in my experience). None of this changed the outcome (i.e. single hit), although the exact corrected p-value did fluctuate. Can anyone with a better understanding of the MEDIPS and MACS algorithms explain these differences? ________________________________ You may reply via email or visit MEDIPS vs MACS: why does MEDIPS generate more conservative p-values Dr. Lukas Chavez Division of Pediatric Neurooncology Group Leader Computational Oncoepigenomics German Cancer Research Center (DKFZ) Foundation under Public Law Im Neuenheimer Feld 280 69120 Heidelberg Germany phone: +49 6221 42-4676 mobile: +49 172 158 8231 l.chavez@dkfz.de<mailto:l.chavez@dkfz.de> www.dkfz.de<http: www.dkfz.de=""><http: www.dkfz.de=""/> https://www.dkfz.de/en/paediatrische-neuroonkologie/index.php http://pediatric-neurooncology.dkfz.de/index.php/en/research/pediatric-neurooncology/bioinformatics [cid:A23DCB56-6826-4A7D-AB02-51FE00A96626@dkfz-heidelberg.de] Management Board: Prof. Dr. Dr. h.c. Otmar D. Wiestler, Prof. Dr. Josef Puchta VAT-ID No.: DE143293537 Confidentiality Note: This message is intended only for the use of the named recipient(s) and may obtain confidential and/or privileged information. If you are not the intended recipient, please contact the sender and delete the message. Any unauthorized use of the information contained in this message is prohibited.
ADD COMMENTlink written 3.9 years ago by Lukas Chavez510

The minRowSum parameter appears to be working. I still haven't gotten it to the level where I'm getting the number of hits MACS was returning (I need to see what kind of parameters they were using for MACS as well), but adjusting it yielded immediate results.

 

I had built my scripts using a MEDIPS tutorial here (http://www.bioconductor.org/packages/2.12/bioc/vignettes/MEDIPS/inst/doc/MEDIPS.pdf). In this tutorial, the example they use involves setting the minRowSum variable to 1. This tutorial claims that this is the default value. Setting the variable to 10 (the default according to your post) resulted in several new hits. Is that tutorial wrong/out of date? Based on what I know now, advising a minRowSum value of 1 seems like bad advice.

 

I'll continue tweaking minRowSum and window size to values more appropriate to the data. Thanks for your help.

ADD REPLYlink written 3.9 years ago by tptacek30500
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 358 users visited in the last hour