The default behavior of DiffBind when merging peaks from different samples is that peaks with at least 1 bp overlap will be merged. For example, peak chr1:100-300 will be merged with peak chr1:299-500. Is it possible to change this behavior, for example, requiring at least 50% bp overlap between the two peaks, so the above example won't merge? If anyone knows how to do this in DiffBind or ways using other R packages that would still be compatible with other parts of DiffBind, that would be great.
I'm adding another answer in response to the comments of AMA regarding the $intersectMode parameter not being properly passed in to summarizeOverlaps.
I can confirm that this is indeed a bug. Besides silently not allowing the default mode to be changed, the actual default being used has been "Union", not "IntersectionNotEmpty" as documented. In most cases, where the consensus peakset contains non-overlapping intervals, these are the same. However if the $mergeOverlap configuration parameter was set to a negative value, the behavior may not have been as expected.
This bug has been fixed and checked in; it will appear in the next day or two as DiffBind_3.2.5.
That version also has added support for a new configuration parameter, $inter.feature, that, if present, will additionally be passed in to summarizeOverlaps. This is documented in the help page for dba.count().
Thanks a lot Rory Stark for your quick response. I'm wondering when will DiffBind_3.2.5 be available? or if there is a repo on GitHub I can use to update what I have.
I don't think so. Bioconductor releases have been dependant on R 4.x for well over a year, and there is no way to go back and change a release prior to that. If you have a build environment you could install from the tar.gz, but there are many dependencies that would also need to be compatible.
We are looking at exposing this feature in DiffBind, but currently there is no way to override the 1bp overlap.
If you are able to separately derive a merged consensus set using alternative criteria, not that you can pass it in to dba.count() and that set will be used for all the subsequent steps (so long as the intervals you pass in are not themselves overlapping).
There will be a version of the feature in the next release (scheduled for May 20). It is available in the Development version starting from DiffBind_3_1_7.
A specific overlap value (in basepairs) can be specified by setting a configuration parameter:
DBA$config$mergeOverlap
The default is 1, meaning all peaks that overlap by at least 1 basepair will be merged. If you set it higher, for example to 100, peaks won't be merged unless they overlap by at least 100bp. Note that this means you can have separate consensus peaks that actually overlap, which may impact the counting, as by default any reads that overlap more than one consensus peak will not be counted. (You can control this with another configuration option, DBA$config$intersectMode).
Negative values can also be used to specify that peaks that do not overlap, but are within a "gap" of a set number of basepairs of each other, will be merged.
There isn't an option to specify the overlap amount using a percentage, just a constant.
I installed the last version of DiffBind, and I'm trying to test your solution. I found that I can use mergeOverlap to configure the merging step. However, I'm not sure what inttersectMode would count the reads without discarding the ones that overlap with multiple peaks (not merged). I figured the counting is based on summarizeOverlap function, but when inspected the source code of DiffBind, I couldn't find where the intersectMode is utilized! I saw it's assigned, but it wasn't actually used in any count function including summarizeOverlap. It seems the default is enforced no matter what value you give intersectMode.
Could you please let me know what you think, and if it's still impossible to implement the idea using DiffBind?
after inspecting summarizeOverlap function, it seems the settings that work in my case is:
example <- summarizeOverlaps(gr, reads, mode="Union", inter.feature=FALSE)
However, I'm not sure if it's possible to use these settings with DiffBind
Thanks a lot Rory Stark for your quick response. I'm wondering when will DiffBind_3.2.5 be available? or if there is a repo on GitHub I can use to update what I have.
Looks like it just went live on the Bioconductor site!
Is there a way to install this version on R3.6.2?
I don't think so. Bioconductor releases have been dependant on R 4.x for well over a year, and there is no way to go back and change a release prior to that. If you have a build environment you could install from the tar.gz, but there are many dependencies that would also need to be compatible.