Hello,
I'm annotating some ATAC peaks with ChIPseeker package, which I find very helpful.
I was trying to replicate the numbers seen in the Vennplots with my own code using intersect()
and findOverlaps()
functions from the GRanges package, and was a bit surprised to find that the numbers given by these functions and the overlap()
function in ChIPseeker package didn't match.
I have 3 peaksets: peaks1
, peaks2
, and peaks3
. To find the peaks that are shared between all samples I use
shared = Reduce(intersect, list(peaks1,peaks2,peaks3) #This is similar to what overlap() does
However, when I move along to fnd overlaps between peaks1
and peaks2
by:
peaks12 = intersect(peaks1,peaks2)[-findOverlaps(intersect(peaks1,peaks2), shared)@from]
ChIPseeker disagrees. I think this is because ChIPseeker uses simple subtraction:
length(intersect(peaks1,peaks2))-length(shared)
This calculation doesn't take into account that there may be multiple ranges within shared
that overlap one element in intersect(peaks1,peaks2)
leading to underestimation of the number of shared peaks between peaks1
and peaks2
.
This is probably not the core function of the package and it might be even the way the author has purposely made it to work, but it lead to some confusion at least on my part. Any comments are appreciated, especially if I'm doing something wrong in here!
Cheers, -Konsta