mforoo1
Last seen 5.0 years ago


I have a question about figure 13 in the Diffbind manual.  according to this part (page4, > tamoxifen <- dba(sampleSheet="tamoxifen.csv") : 

3 MCF71 MCF7 ER Responsive Full-Media 1 raw 1513  , we have 1513 peaks in MCF71 sample. when we draw the binding site overlaps (figure12), if we add all number in MCF71 sample, it did not give us the same number of peaks.

numbers in MCF71 Venn diagram: 359+43+230+885 = 1517

I have the same situation in my samples

 1 WTC1  Shoot H3K4me3        WT         C         1    bed     18703

I have 18703 peaks in WTC1 sample,but in my venn diagram if I add the numver for WTC1 sapmle, it did not give me the same number. 459 + 293+ 602+ 16934= 18288  

Rory Stark
Last seen 8 days ago
CRUK, Cambridge, UK

There are two things going on here. One is an error for which I am checking in a fix. The other is the way it is supposed to work.

The error is that the peak data included with the package does not exactly match the data used in the prebuilt objects (eg, tamoxifen_peaks). So you are seeing 1513 peaks for this sample, while if you load the prebuilt data you would see 1556 peaks, which is what was used to generate Figure 13.

You can try this yourself:

> data(tamoxifen_peaks)
> tamoxifen
11 Samples, 2845 sites in matrix (3795 total):
       ID Tissue Factor  Condition  Treatment Replicate Caller Intervals
1  BT4741  BT474     ER  Resistant Full-Media         1    bed      1080
2  BT4742  BT474     ER  Resistant Full-Media         2    bed      1122
3   MCF71   MCF7     ER Responsive Full-Media         1    bed      1556
4   MCF72   MCF7     ER Responsive Full-Media         2    bed      1046
5   MCF73   MCF7     ER Responsive Full-Media         3    bed      1339
6   T47D1   T47D     ER Responsive Full-Media         1    bed       527
7   T47D2   T47D     ER Responsive Full-Media         2    bed       373
8  MCF7r1   MCF7     ER  Resistant Full-Media         1    bed      1438
9  MCF7r2   MCF7     ER  Resistant Full-Media         2    bed       930
10  ZR751   ZR75     ER Responsive Full-Media         1    bed      2346
11  ZR752   ZR75     ER Responsive Full-Media         2    bed      2345

Here you see that sample MCF71 has 1556 peaks. But as you point out, the Venn diagram in Figure 13 adds up to 1517 peaks, which is fewer.

The reason there are fewer peaks in the overlaps is because some peak merging has taken place. Suppose there are two nearby peaks in MCF71 that  overlap with a single peak in MCF72:

MCF71     |-------|       |-------|
MCF72         |-------------|

How many peaks are in the overlap between the two samples? Is it 1 peak, or two peaks, or three peaks? The way DiffBind handles this is to merge overlapping peaks into the widest area that encompasses all the overlapping peaks:

MCF71     |-------|       |-------|
MCF72         |-------------|
MERGED    |-----------------------|

So it will count this as 1 overlapping region. For this reason, the number of merged peaks is always less than or equal to the number of original peaks.




