Question: ChIPpeakAnno :findOverlapsOfPeaks: number for overlap peak in venn diagram is different than output of ol1$overlappingPeaks
gravatar for sudhirjadhao2009
5 months ago by
sudhirjadhao20090 wrote:

I am finding overlapping region in two chipseq samples. The number of overlapping peaks in ven diagram are different from what you get in  ol1$overlappingPeaks and ol1$peaksInMergedPeaks. For specific peak the ven diagram number is same as in ol1$peaklist. But not for overlapping peaks.

In example data provided by tool, I found same number in overlapping region and in ol$peaksInMergedPeaks.

command used:

NCEtOHR1S17.bed <- toGRanges(files[3], format="BED", header=FALSE)


si8444DHTR1S16.bed <- toGRanges(files[9], format="BED", header=FALSE) 


GRanges object with 6 ranges and 1 metadata column:
                                        seqnames           ranges strand |     score
                                           <Rle>        <IRanges>  <Rle> | <numeric>
   NC_EtOH_R1_S17_sorted.bam.bam_peak_1     chr1 [ 10023,  10215]      * |         7
   NC_EtOH_R1_S17_sorted.bam.bam_peak_2     chr1 [526379, 526773]      * |        22
  NC_EtOH_R1_S17_sorted.bam.bam_peak_3b     chr1 [564391, 565477]      * |        35
  NC_EtOH_R1_S17_sorted.bam.bam_peak_4b     chr1 [565600, 566838]      * |        34
   NC_EtOH_R1_S17_sorted.bam.bam_peak_5     chr1 [567182, 567748]      * |        44
  NC_EtOH_R1_S17_sorted.bam.bam_peak_6a     chr1 [567813, 568982]      * |        32
  seqinfo: 24 sequences from an unspecified genome; no seqlengths


GRanges object with 6 ranges and 1 metadata column:
                                            seqnames           ranges strand |     score
                                               <Rle>        <IRanges>  <Rle> | <numeric>
   si844_4_DHT_R1_S16_sorted.bam.bam_peak_1     chr1 [  9969,  10370]      * |        10
  si844_4_DHT_R1_S16_sorted.bam.bam_peak_2c     chr1 [564398, 565440]      * |        38
   si844_4_DHT_R1_S16_sorted.bam.bam_peak_3     chr1 [565535, 565905]      * |        25
  si844_4_DHT_R1_S16_sorted.bam.bam_peak_4a     chr1 [566068, 566824]      * |        37
  si844_4_DHT_R1_S16_sorted.bam.bam_peak_5b     chr1 [567219, 567773]      * |        61
  si844_4_DHT_R1_S16_sorted.bam.bam_peak_6a     chr1 [567841, 568626]      * |        24
  seqinfo: 24 sequences from an unspecified genome; no seqlengths


ol1 <-findOverlapsOfPeaks(NCEtOHR1S17.bed, si8441DHTR1S18)

ol1 <- addMetadata(ol1, colNames="score", FUN=mean)


     NCEtOHR1S17.bed si8444DHTR1S16 Counts
[1,]               0              0      0
[2,]               0              1   1382
[3,]               1              0   7566
[4,]               1              1   8023
[1] "VennCounts"

> nrow($overlappingPeaks))
[1] 8270
> nrow($mergedPeaks))
[1] 7974






ADD COMMENTlink modified 5 months ago by Ou, Jianhong990 • written 5 months ago by sudhirjadhao20090
gravatar for Ou, Jianhong
5 months ago by
Ou, Jianhong990
United States
Ou, Jianhong990 wrote:


This question has been asked many times. Please refer the answer in C: Curious behaviour of ChIPpeakAnno

You can have a try to turn on the argument connectedPeaks="keepAll" to see how the numbers change.

Basically, the merged peaks may be different from the original peaks because they are merged from two or more peaks since they have overlaps. peaksInMergedPeaks will tell you the original peaks before merging.  peaklist indicates the peaks for each category. overlappingPeaks are merged results for pair overlapping peaks. 

Let me know if there are any question.

ADD COMMENTlink written 5 months ago by Ou, Jianhong990

Thank you for replay.

I got your point . but whatever number is common,  I want to extract id of those common peaks

Below is the summary of my data.

                                              Length Class   Mode
si8444DHTR1S16                                 803   GRanges S4  
si8441DHTR1S18                                2188   GRanges S4  
si8441DHTR1S18///si8444DHTR1S16                552   GRanges S4  
NCEtOHR1S17                                   4561   GRanges S4  
NCEtOHR1S17///si8444DHTR1S16                   584   GRanges S4  
NCEtOHR1S17///si8441DHTR1S18                  2973   GRanges S4  
NCEtOHR1S17///si8441DHTR1S18///si8444DHTR1S16 7342   GRanges S4  


1. I want to extract  Id of  peaks which are  contributing to this number 7342 . Can you please tell me how i should process

2. In above summary, common number is 7342 . but in Venn diagram the common number is different(7415). Can you please tell me why I am getting  different numbers for overlap

for details of Image please follow below line





ADD REPLYlink written 5 months ago by sudhirjadhao20090


I will show some sample codes to answer your question. Hope this can help you to understand what happened for the Venn-diagram.

For question 1:

There is a column in the metadata of each element in peaklist, called peakNames, which is a CharacterList. The CharacterList is a list of the contributing peaks ids with prefix, eg. peaks1__peakname1, peaksi__peaknamej. Users can access the original peak name by split those characters.

bed <- system.file("extdata", "MACS_output.bed", package="ChIPpeakAnno")
gr1 <- toGRanges(bed, format="BED", header=FALSE)
gff <- system.file("extdata", "GFF_peaks.gff", package="ChIPpeakAnno")
gr2 <- toGRanges(gff, format="GFF", header=FALSE, skip=3)
ol <- findOverlapsOfPeaks(gr1, gr2)
peakNames <- ol$peaklist[['gr1///gr2']]$peakNames
peakNames1 <- melt(peakNames,"")
peakNames1 <- cbind(peakNames1[, 1],, strsplit(as.character(peakNames1[, 3]), "__")))
colnames(peakNames1) <- c("", "group", "peakName")
gr1.subset <- gr1[peakNames1[peakNames1[, "group"] %in% "gr1", "peakName"]]
gr2.subset <- gr2[peakNames1[peakNames1[, "group"] %in% "gr2", "peakName"]]

More easier way to access the original peaks is by using the renames peaks.

all.peaks <- ol$all.peaks
gr1.renamed <- all.peaks$gr1
gr2.renamed <- all.peaks$gr2
peakNames <- melt(ol$peaklist[['gr1///gr2']]$peakNames,"")
gr1.sub <- gr1.renamed[peakNames[grepl("^gr1", peakNames[, 3]), 3]]
gr2.sub <- gr2.renamed[peakNames[grepl("^gr2", peakNames[, 3]), 3]]

For question 2:

There is a argument called connectedPeaks. In the documentation, we described the argument connectedPeaks as If multiple peaks involved in overlapping in several groups, set it to "merge" will count it as 1, while set it to "min" will count it as the minimal involved peaks in any group of connected/overlapped peaks. The default is “min”. By default, the program will select the minimal number from each peaklist involved in one merged peak. So the number will be no less than the number in the peaklist. If user set connectedPeaks to merge, the number will be exactly same as the number in the peaklist. It is not easy to understand above description. I provide a simple example for you to understand what is difference between them:

p1 <- GRanges("1", IRanges(c(1, 4, 7), width=2))
p2 <- GRanges("1", IRanges(c(2, 5), width=3))
ol_min <- findOverlapsOfPeaks(p1, p2, connectedPeaks="min") ## the counts will be the minimal peaks involved in that group of connected peaks, so you get 2.
ol_merge <- findOverlapsOfPeaks(p1, p2, connectedPeaks="merge") ## the counts will be 1 for each group of connected peaks

I paste the output here:

> ol_min$venn_cnt
     p1 p2 Counts
[1,]  0  0      0
[2,]  0  1      0
[3,]  1  0      0
[4,]  1  1      2
[1] "VennCounts"
> ol_merge$venn_cnt
     p1 p2 Counts
[1,]  0  0      0
[2,]  0  1      0
[3,]  1  0      0
[4,]  1  1      1
[1] "VennCounts"


Let me know if you still have any question.

ADD REPLYlink written 5 months ago by Ou, Jianhong990
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 122 users visited in the last hour