DiffBind dba.plotVenn - Venn doesn't match Intervals
1
0
Entering edit mode
t.severson • 0
@tseverson-6992
Last seen 9.5 years ago
Netherlands

Hello all, when I use dba.plotVenn to get diagrams of my peaksets using DiffBind the Intervals numbers don't match the Venn diagram.

> tumors <- dba(sampleSheet='file.csv',minOverlap=F)
AFTER001_pre Breast ER pre  1 bed
AFTER001_post Breast ER post  1 bed

> tumors
17 Samples, 24627 sites in matrix:
              ID Tissue Factor Condition Replicate Peak.caller Intervals
1   AFTER001_pre Breast     ER       pre         1         bed      2071
8  AFTER001_post Breast     ER      post         1         bed      3381

But the Venn diagram generated has 1204 AFTER001_pre only peaks, 242 AFTER001_post only peaks and 839 overlapping peaks. 1204+839!=2071. Anyone know what I'm doing wrong?

Thanks, Tesa

 

diffbind • 1.6k views
ADD COMMENT
0
Entering edit mode

Sorry, I forgot session info.

> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] DiffBind_1.6.2       GenomicRanges_1.12.5 IRanges_1.18.3
[4] BiocGenerics_0.6.0   BiocInstaller_1.10.3

loaded via a namespace (and not attached):
 [1] amap_0.8-12        bitops_1.0-6       caTools_1.17.1     edgeR_3.2.4
 [5] gdata_2.13.3       gplots_2.14.2      gtools_3.4.1       KernSmooth_2.23-13
 [9] limma_3.16.8       RColorBrewer_1.0-5 stats4_3.1.1       tools_3.1.1
[13] zlibbioc_1.6.0

 

ADD REPLY
2
Entering edit mode
Gord Brown ▴ 650
@gord-brown-5664
Last seen 3.3 years ago
United Kingdom

Hi, Tesa,

You're most likely not doing anything wrong. To make the Venn diagram, DiffBind merges the peak sets of the samples, using a not-very-sophisticated algorithm that merges regions if they overlap at all.  In your case, there are probably instances where 2 regions in one sample overlap the same region in the other.  They'll all be merged into one big region, hence the numbers won't add up.

Now and again we talk about more clever algorithms (perhaps employing PeakSplitter or something along those lines) but it's never made it to the top of our to-do list, alas. Alternatively we could report 2 numbers in the overlapping region, one for one sample and another for the second.  But we haven't done that either... :(

Hope this helps... or at least explains what's happening.

Cheers,

 - Gord

ADD COMMENT
0
Entering edit mode

Thanks for your reply, Gord. That makes sense. I've used the package quite a bit and had never seen it happen so it was a bit concerning. Now I understand.

Cheers!

Tesa

ADD REPLY

Login before adding your answer.

Traffic: 705 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6