Search
Question: ChIPpeakAnno :findOverlapsOfPeaks: number for overlap peak in venn diagram is different than output of ol1$overlappingPeaks 0 14 months ago by sudhirjadhao20090 wrote: I am finding overlapping region in two chipseq samples. The number of overlapping peaks in ven diagram are different from what you get in ol1$overlappingPeaks and ol1$peaksInMergedPeaks. For specific peak the ven diagram number is same as in ol1$peaklist. But not for overlapping peaks.

In example data provided by tool, I found same number in overlapping region and in ol$peaksInMergedPeaks. command used: NCEtOHR1S17.bed <- toGRanges(files[3], format="BED", header=FALSE) si8444DHTR1S16.bed <- toGRanges(files[9], format="BED", header=FALSE) head(NCEtOHR1S17.bed) GRanges object with 6 ranges and 1 metadata column: seqnames ranges strand | score <Rle> <IRanges> <Rle> | <numeric> NC_EtOH_R1_S17_sorted.bam.bam_peak_1 chr1 [ 10023, 10215] * | 7 NC_EtOH_R1_S17_sorted.bam.bam_peak_2 chr1 [526379, 526773] * | 22 NC_EtOH_R1_S17_sorted.bam.bam_peak_3b chr1 [564391, 565477] * | 35 NC_EtOH_R1_S17_sorted.bam.bam_peak_4b chr1 [565600, 566838] * | 34 NC_EtOH_R1_S17_sorted.bam.bam_peak_5 chr1 [567182, 567748] * | 44 NC_EtOH_R1_S17_sorted.bam.bam_peak_6a chr1 [567813, 568982] * | 32 ------- seqinfo: 24 sequences from an unspecified genome; no seqlengths head(si8444DHTR1S16) GRanges object with 6 ranges and 1 metadata column: seqnames ranges strand | score <Rle> <IRanges> <Rle> | <numeric> si844_4_DHT_R1_S16_sorted.bam.bam_peak_1 chr1 [ 9969, 10370] * | 10 si844_4_DHT_R1_S16_sorted.bam.bam_peak_2c chr1 [564398, 565440] * | 38 si844_4_DHT_R1_S16_sorted.bam.bam_peak_3 chr1 [565535, 565905] * | 25 si844_4_DHT_R1_S16_sorted.bam.bam_peak_4a chr1 [566068, 566824] * | 37 si844_4_DHT_R1_S16_sorted.bam.bam_peak_5b chr1 [567219, 567773] * | 61 si844_4_DHT_R1_S16_sorted.bam.bam_peak_6a chr1 [567841, 568626] * | 24 ------- seqinfo: 24 sequences from an unspecified genome; no seqlengths ol1 <-findOverlapsOfPeaks(NCEtOHR1S17.bed, si8441DHTR1S18) ol1 <- addMetadata(ol1, colNames="score", FUN=mean) makeVennDiagram(ol1)$vennCounts
NCEtOHR1S17.bed si8444DHTR1S16 Counts
[1,]               0              0      0
[2,]               0              1   1382
[3,]               1              0   7566
[4,]               1              1   8023
attr(,"class")
[1] "VennCounts"

> nrow(as.data.frame(ol1$overlappingPeaks)) [1] 8270 > nrow(as.data.frame(ol1$mergedPeaks))
[1] 7974

modified 14 months ago by Ou, Jianhong1.1k • written 14 months ago by sudhirjadhao20090
0
14 months ago by
Ou, Jianhong1.1k
United States
Ou, Jianhong1.1k wrote:

Hi,

This question has been asked many times. Please refer the answer in C: Curious behaviour of ChIPpeakAnno

You can have a try to turn on the argument connectedPeaks="keepAll" to see how the numbers change.

Basically, the merged peaks may be different from the original peaks because they are merged from two or more peaks since they have overlaps. peaksInMergedPeaks will tell you the original peaks before merging.  peaklist indicates the peaks for each category. overlappingPeaks are merged results for pair overlapping peaks.

Let me know if there are any question.

Thank you for replay.

I got your point . but whatever number is common,  I want to extract id of those common peaks

Below is the summary of my data.

summary(ol$peaklist) Length Class Mode si8444DHTR1S16 803 GRanges S4 si8441DHTR1S18 2188 GRanges S4 si8441DHTR1S18///si8444DHTR1S16 552 GRanges S4 NCEtOHR1S17 4561 GRanges S4 NCEtOHR1S17///si8444DHTR1S16 584 GRanges S4 NCEtOHR1S17///si8441DHTR1S18 2973 GRanges S4 NCEtOHR1S17///si8441DHTR1S18///si8444DHTR1S16 7342 GRanges S4 1. I want to extract Id of peaks which are contributing to this number 7342 . Can you please tell me how i should process 2. In above summary, common number is 7342 . but in Venn diagram the common number is different(7415). Can you please tell me why I am getting different numbers for overlap for details of Image please follow below line https://drive.google.com/open?id=0B5CeCVgbLqijajBVZjlMWnNMVGM ADD REPLYlink written 14 months ago by sudhirjadhao20090 Hi, I will show some sample codes to answer your question. Hope this can help you to understand what happened for the Venn-diagram. For question 1: There is a column in the metadata of each element in peaklist, called peakNames, which is a CharacterList. The CharacterList is a list of the contributing peaks ids with prefix, eg. peaks1__peakname1, peaksi__peaknamej. Users can access the original peak name by split those characters. library(ChIPpeakAnno) bed <- system.file("extdata", "MACS_output.bed", package="ChIPpeakAnno") gr1 <- toGRanges(bed, format="BED", header=FALSE) gff <- system.file("extdata", "GFF_peaks.gff", package="ChIPpeakAnno") gr2 <- toGRanges(gff, format="GFF", header=FALSE, skip=3) ol <- findOverlapsOfPeaks(gr1, gr2) peakNames <- ol$peaklist[['gr1///gr2']]$peakNames library(reshape2) peakNames1 <- melt(peakNames, value.name="merged.peak.id") peakNames1 <- cbind(peakNames1[, 1], do.call(rbind, strsplit(as.character(peakNames1[, 3]), "__"))) colnames(peakNames1) <- c("merged.peak.id", "group", "peakName") head(peakNames1) gr1.subset <- gr1[peakNames1[peakNames1[, "group"] %in% "gr1", "peakName"]] gr2.subset <- gr2[peakNames1[peakNames1[, "group"] %in% "gr2", "peakName"]] More easier way to access the original peaks is by using the renames peaks. all.peaks <- ol$all.peaks
gr1.renamed <- all.peaks$gr1 gr2.renamed <- all.peaks$gr2
peakNames <- melt(ol$peaklist[['gr1///gr2']]$peakNames, value.name="merged.peak.id")
gr1.sub <- gr1.renamed[peakNames[grepl("^gr1", peakNames[, 3]), 3]]
gr2.sub <- gr2.renamed[peakNames[grepl("^gr2", peakNames[, 3]), 3]]

For question 2:

There is a argument called connectedPeaks. In the documentation, we described the argument connectedPeaks as If multiple peaks involved in overlapping in several groups, set it to "merge" will count it as 1, while set it to "min" will count it as the minimal involved peaks in any group of connected/overlapped peaks. The default is “min”. By default, the program will select the minimal number from each peaklist involved in one merged peak. So the number will be no less than the number in the peaklist. If user set connectedPeaks to merge, the number will be exactly same as the number in the peaklist. It is not easy to understand above description. I provide a simple example for you to understand what is difference between them:

p1 <- GRanges("1", IRanges(c(1, 4, 7), width=2))
p2 <- GRanges("1", IRanges(c(2, 5), width=3))
ol_min <- findOverlapsOfPeaks(p1, p2, connectedPeaks="min") ## the counts will be the minimal peaks involved in that group of connected peaks, so you get 2.
ol_merge <- findOverlapsOfPeaks(p1, p2, connectedPeaks="merge") ## the counts will be 1 for each group of connected peaks
ol_min$venn_cnt ol_merge$venn_cnt

I paste the output here:

> ol_min$venn_cnt p1 p2 Counts [1,] 0 0 0 [2,] 0 1 0 [3,] 1 0 0 [4,] 1 1 2 attr(,"class") [1] "VennCounts" > ol_merge$venn_cnt
p1 p2 Counts
[1,]  0  0      0
[2,]  0  1      0
[3,]  1  0      0
[4,]  1  1      1
attr(,"class")
[1] "VennCounts"

Let me know if you still have any question.