Question

MEDIPS genome Vs createROIset significance issue

0

Entering edit mode

peter.mcerlean • 0

@petermcerlean-9754

Last seen 7.7 years ago

Hi Lukas,

I've been using MEDIPS to determine differential coverage across enhancer regions identified by ROSE from control (x2) and disease (x3) H3K27ac data:

MEDIPS 1kb windows P<0.05: 101,691.

MEDIPS 1kb windows adjP<0.05: 0

Enhancer Regions: 294

I then intersect the data to see the density of significant windows in each enhancer:

Sig windows in enhancers: 1,043

Enhancers containing >1 sig window: 243

However, when I use my enhancers in createROIset analysis I get

Enhancers P<0.05: 100

Enhancers AdjP<0.05: 20

And quite confusingly, some enhancers which did not contain any significant 1kb windows are now themselves significant!?

I realize that the difference may be due to the normalization at just my enhancers Vs genome.

However, which dataset should I trust? The enhancers as ROIs or the density of significant windows within enhancers?

Kind Regards

Peter

Workflow:

Disease/Control=MEDIPS.createSet(BSgenome = "BSgenome.Hsapiens.UCSC.hg19", uniq = 1, extend = 120, shift = 0, window_size = 1000, chr.select = c("chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19","chr20","chr21","chr22,"chrX","chrY"))

Genome=MEDIPS.meth(MSet1=Disease, MSet2=Control, ISet1=Disease Input, ISet2=Control Input , p.adj = "bonferroni", diff.method = "edgeR", minRowSum=10, MeDIP=F, quantile=TRUE)

Disase/Control.Enhancers=MEDIPS.createROIset(ROI = Enhancers, BSgenome = "BSgenome.Hsapiens.UCSC.hg19", uniq = 1, extend = 120, shift = 0, bn = 1)

Enhancers.diff = MEDIPS.meth(MSet1 = Disease.enhancers, MSet2 = Control.enhancers ,ISet1 = Disease.enhancers.Input, ISet2 = Control.enhancers.Input, p.adj = "bonferroni", diff.method = "edgeR", minRowSum=10, MeDIP=F, quantile=TRUE)

medips r • 1.6k views

ADD COMMENT • link 8.2 years ago peter.mcerlean • 0

score 0 · Answer 1 · 2016-02-21

Dear Peter, as you are using ROSE, I assume that you are in fact dealing with super enhancers (SE) which can be very long. What is the size distribution of your SE’s? What is the biological definition of an SE and what exactly do you want to test by a differential SE enrichment analysis? Do you want to test, if the entire SE 'is present or more active' in one of two different conditions? If this is what you want to test for, the SE ROI approach might be ok. Personally, I think that there are regions within SE, which can have distinct epigenetic dynamics. I am currently working at an approach, where I am defining differential super enhancers, by first searching for differentially enriched regions (DERs, at small windows) and by subsequently stitching and ranking these DERs according to the SE definition. I would not necessarily say that the normalisation makes the greatest difference between the two approaches you tested. It’s more that you have a small sample size and by testing many 1kb windows, multiple testing has a strong impact on your results. On the other hand, it’s not surprising that you get some SE’s as significantly differentially enriched even if there are no individual 1kb windows. This is because you have much higher count values within long SE’s compared to small windows, resulting in increased statistical power. All the best, Lukas On 19 Feb 2016, at 11:35, peter.mcerlean [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""/> User peter.mcerlean<https: support.bioconductor.org="" u="" 9754=""/> wrote Question: MEDIPS genome Vs createROIset significance issue<https: support.bioconductor.org="" p="" 78577=""/>: Hi Lukas, I've been using MEDIPS to determine differential coverage across enhancer regions identified by ROSE from control (x2) and disease (x3) H3K27ac data: MEDIPS 1kb windows P<0.05: 101,691. MEDIPS 1kb windows adjP<0.05: 0 Enhancer Regions: 294 I then intersect the data to see the density of significant windows in each enhancer: Sig windows in enhancers: 1,043 Enhancers containing >1 sig window: 243 However, when I use my enhancers in createROIset analysis I get Enhancers P<0.05: 100 Enhancers AdjP<0.05: 20 And quite confusingly, some enhancers which did not contain any significant 1kb windows are now themselves significant!? I realize that the difference may be due to the normalization at just my enhancers Vs genome. However, which dataset should I trust? The enhancers as ROIs or the density of significant windows within enhancers? Kind Regards Peter Workflow: Disease/Control=MEDIPS.createSet(BSgenome = "BSgenome.Hsapiens.UCSC.hg19", uniq = 1, extend = 120, shift = 0, window_size = 1000, chr.select = c("chr1","chr2","chr3","chr4","chr5","chr6","chr7","chr8","chr9","chr10","chr11","chr12","chr13","chr14","chr15","chr16","chr17","chr18","chr19","chr20","chr21","chr22,"chrX","chrY")) Genome=MEDIPS.meth(MSet1=Disease, MSet2=Control, ISet1=Disease Input, ISet2=Control Input , p.adj = "bonferroni", diff.method = "edgeR", minRowSum=10, MeDIP=F, quantile=TRUE) Disase/Control.Enhancers=MEDIPS.createROIset(ROI = Enhancers, BSgenome = "BSgenome.Hsapiens.UCSC.hg19", uniq = 1, extend = 120, shift = 0, bn = 1) Enhancers.diff = MEDIPS.meth(MSet1 = Disease.enhancers, MSet2 = Control.enhancers ,ISet1 = Disease.enhancers.Input, ISet2 = Control.enhancers.Input, p.adj = "bonferroni", diff.method = "edgeR", minRowSum=10, MeDIP=F, quantile=TRUE) ________________________________ Post tags: medips, r You may reply via email or visit MEDIPS genome Vs createROIset significance issue

score 0 · Answer 2 · 2016-02-22

0

Entering edit mode

peter.mcerlean • 0

@petermcerlean-9754

Last seen 7.7 years ago

Hi Lukas,

Yes I’ve been using ROSE to identify SEs and looking to see what’s happing to the SEs I’ve identified in my disease group. Not surprisingly, it seems that the disease SEs are found in regions of H3K27ac ‘gain’ rather than loss. What I wanted to do was to add some measure of significance to these gains using an additional program as a way to confirm their presence.

However, I have found that regions within the same SE can exhibit quite large and significant gains or losses, which may represent as you suggested, distinct epigenetic dynamics. This was at the heart of my confusion regarding the enhancer Vs genome windows analysis and which one I should use to confirm significance.

As per your suggestion, I’ve tweaked my minRowSum and looked at the data again:

MEDIPS 1kb Windows minRowSum=30:

P<0.05: 35,807

AdjP<0.05: 3

Sig windows in Enhancers: 919

Enhancers containing >1 Sig Window: 240

Enhancers as ROI w/minRowSum=30:

P<0.05: 99

AdjP<0.05: 20

It seems that while I’ve been able to tidy up the genome-wide analysis, the enhancer still looks the same. I also failed to mention before that while the P values are stronger with enhacerROIs, the fold changes are not really comparable to what was found in the genome windows.

The SEs have an average size of 42kb but do range between 0.4-162kb which I admit is a tremendous spread. Is this dilution in fold change also due to there being more reads available in the SEs?

I’d be curious to see how your stitching approach works but in the meantime, any suggestions as to determining enhancer significance?

Kind Regards,

Peter

ADD COMMENT • link 8.2 years ago peter.mcerlean • 0

0

Entering edit mode

Dear Peter, lease see my comments inserted to your text below. All the best, Lukas On 22 Feb 2016, at 15:19, peter.mcerlean [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""/> User peter.mcerlean<https: support.bioconductor.org="" u="" 9754=""/> wrote Answer: MEDIPS genome Vs createROIset significance issue<https: support.bioconductor.org="" p="" 78577="" #78665="">: Hi Lukas, Yes I’ve been using ROSE to identify SEs and looking to see what’s happing to the SEs I’ve identified in my disease group. Not surprisingly, it seems that the disease SEs are found in regions of H3K27ac ‘gain’ rather than loss. What I wanted to do was to add some measure of significance to these gains using an additional program as a way to confirm their presence. However, I have found that regions within the same SE can exhibit quite large and significant gains or losses, which may represent as you suggested, distinct epigenetic dynamics. This was at the heart of my confusion regarding the enhancer Vs genome windows analysis and which one I should use to confirm significance. As per your suggestion, I’ve tweaked my minRowSum and looked at the data again: MEDIPS 1kb Windows minRowSum=30: P<0.05: 35,807 AdjP<0.05: 3 Sig windows in Enhancers: 919 Enhancers containing >1 Sig Window: 240 Enhancers as ROI w/minRowSum=30: P<0.05: 99 AdjP<0.05: 20 It seems that while I’ve been able to tidy up the genome-wide analysis, the enhancer still looks the same. The total number of SE’s is typically small. Therefore, multiple testing correction does not have such a high impact on the p-values as in a genome wide approach. The minRowSum parameter is useful for excluding many non or low covered genomic regions before testing. SE have high coverage by definition and therefore, a minRowSum parameter applied to SE will not make a difference. I also failed to mention before that while the P values are stronger with enhacerROIs, the fold changes are not really comparable to what was found in the genome windows. Fold changes of selected 1kb regions within an up to 162kb super enhancer will be different than the fold change of an entire SE, yes. The SEs have an average size of 42kb but do range between 0.4-162kb which I admit is a tremendous spread. Is this dilution in fold change also due to there being more reads available in the SEs? No, it’s probably more, because SE also enclose genomic regions without any enrichment or without differences in enrichment. I’d be curious to see how your stitching approach works but in the meantime, any suggestions as to determining enhancer significance? What exactly do you want to test? If Kind Regards, Peter ________________________________ Post tags: medips, r You may reply via email or visit A: MEDIPS genome Vs createROIset significance issue

ADD REPLY • link 8.2 years ago Lukas Chavez ▴ 570

0

Entering edit mode

Dear Peter, please see my comments inserted to your text below. All the best, Lukas On 22 Feb 2016, at 15:19, peter.mcerlean [bioc] <noreply@bioconductor.org<mailto:noreply@bioconductor.org>> wrote: Activity on a post you are following on support.bioconductor.org<https: support.bioconductor.org=""/> User peter.mcerlean<https: support.bioconductor.org="" u="" 9754=""/> wrote Answer: MEDIPS genome Vs createROIset significance issue<https: support.bioconductor.org="" p="" 78577="" #78665="">: Hi Lukas, Yes I’ve been using ROSE to identify SEs and looking to see what’s happing to the SEs I’ve identified in my disease group. Not surprisingly, it seems that the disease SEs are found in regions of H3K27ac ‘gain’ rather than loss. What I wanted to do was to add some measure of significance to these gains using an additional program as a way to confirm their presence. However, I have found that regions within the same SE can exhibit quite large and significant gains or losses, which may represent as you suggested, distinct epigenetic dynamics. This was at the heart of my confusion regarding the enhancer Vs genome windows analysis and which one I should use to confirm significance. As per your suggestion, I’ve tweaked my minRowSum and looked at the data again: MEDIPS 1kb Windows minRowSum=30: P<0.05: 35,807 AdjP<0.05: 3 Sig windows in Enhancers: 919 Enhancers containing >1 Sig Window: 240 Enhancers as ROI w/minRowSum=30: P<0.05: 99 AdjP<0.05: 20 It seems that while I’ve been able to tidy up the genome-wide analysis, the enhancer still looks the same. The total number of SE’s is typically small. Therefore, multiple testing correction does not have such a high impact on the p-values as in a genome wide approach. The minRowSum parameter is useful for excluding many non or low covered genomic regions before testing. SE have high coverage by definition and therefore, a minRowSum parameter applied to SE will not make a difference. I also failed to mention before that while the P values are stronger with enhacerROIs, the fold changes are not really comparable to what was found in the genome windows. Fold changes of selected 1kb regions within an up to 162kb super enhancer will be different than the fold change of an entire SE, yes. The SEs have an average size of 42kb but do range between 0.4-162kb which I admit is a tremendous spread. Is this dilution in fold change also due to there being more reads available in the SEs? No, it’s probably more, because SE also enclose genomic regions without any enrichment or without differences in enrichment. I’d be curious to see how your stitching approach works but in the meantime, any suggestions as to determining enhancer significance? What exactly do you want to test? If you want to test, if an entire SE has statistically significant change of coverage- regardless of what is going on at different sub regions within the SE, then the ROI approach is a valid. If it’s a valid biological question is something else. Kind Regards, Peter ________________________________ Post tags: medips, r You may reply via email or visit A: MEDIPS genome Vs createROIset significance issue

ADD REPLY • link 8.2 years ago Lukas Chavez ▴ 570

0

Entering edit mode

Hi Lukas,

Thanks for the replies. I'm grasping with the whole biological question so I think I'm in favour of the significant windows Vs whole SE approach. My main motivation is to identify conserved motifs so just by sheer size, I think the windows will give me the best chance to do this rather that looking at the SE overall.

Best,

Peter

ADD REPLY • link 8.2 years ago peter.mcerlean • 0