Question: DiffBind dba.plotVenn - Venn doesn't match Intervals
5.0 years ago by
t.severson wrote:

Hello all, when I use dba.plotVenn to get diagrams of my peaksets using DiffBind the Intervals numbers don't match the Venn diagram.

> tumors <- dba(sampleSheet='file.csv',minOverlap=F)
AFTER001_pre Breast ER pre  1 bed
AFTER001_post Breast ER post  1 bed

> tumors
17 Samples, 24627 sites in matrix:
              ID Tissue Factor Condition Replicate Peak.caller Intervals
1   AFTER001_pre Breast     ER       pre         1         bed      2071
8  AFTER001_post Breast     ER      post         1         bed      3381

But the Venn diagram generated has 1204 AFTER001_pre only peaks, 242 AFTER001_post only peaks and 839 overlapping peaks. 1204+839!=2071. Anyone know what I'm doing wrong?

Thanks, Tesa


diffbind • 1.0k views
written 5.0 years ago by t.severson

Sorry, I forgot session info.

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] DiffBind_1.6.2       GenomicRanges_1.12.5 IRanges_1.18.3
[4] BiocGenerics_0.6.0   BiocInstaller_1.10.3

loaded via a namespace (and not attached):
 [1] amap_0.8-12        bitops_1.0-6       caTools_1.17.1     edgeR_3.2.4
 [5] gdata_2.13.3       gplots_2.14.2      gtools_3.4.1       KernSmooth_2.23-13
 [9] limma_3.16.8       RColorBrewer_1.0-5 stats4_3.1.1       tools_3.1.1
[13] zlibbioc_1.6.0


written 5.0 years ago by t.severson
Answer: DiffBind dba.plotVenn - Venn doesn't match Intervals
5.0 years ago by
Gord Brown590
United Kingdom
Gord Brown wrote:

Hi, Tesa,

You're most likely not doing anything wrong. To make the Venn diagram, DiffBind merges the peak sets of the samples, using a not-very-sophisticated algorithm that merges regions if they overlap at all.  In your case, there are probably instances where 2 regions in one sample overlap the same region in the other.  They'll all be merged into one big region, hence the numbers won't add up.

Now and again we talk about more clever algorithms (perhaps employing PeakSplitter or something along those lines) but it's never made it to the top of our to-do list, alas. Alternatively we could report 2 numbers in the overlapping region, one for one sample and another for the second.  But we haven't done that either... :(

Hope this helps... or at least explains what's happening.


 - Gord

written 5.0 years ago by Gord Brown

Thanks for your reply, Gord. That makes sense. I've used the package quite a bit and had never seen it happen so it was a bit concerning. Now I understand.



written 5.0 years ago by t.severson
