DiffBind: Values of Called1 and Called2 do not appear to coincide with input peaks
Entering edit mode
Ian • 0
Last seen 1 day ago
United Kingdom


I have run DiffBind where the input is 200bp regions centred upon MACS2 summits run using --call-summits. I opted to run dba.count with summits=FALSE.

samples_qval <- dba(sampleSheet="sample_sheet_qval.csv")
samples_qval <- dba.count(samples_qval, minOverlap=2, summits=FALSE, bParallel = FALSE)
samples_qval <- dba.normalize(samples_qval)
samples_qval <- dba.contrast(samples_qval, minMembers=2, categories=DBA_CONDITION)
samples_qval <- dba.analyze(samples_qval, bParallel=FALSE)
samples_qval_C1 <- dba.report(samples_qval, contrast=1, bCalled=TRUE, bCounts=TRUE, bCalledDetail = TRUE, bFlip = FALSE)

However, the values in the Called1 and Called2 columns do not appear to correspond to the original peaks. I would expect that if the new DiffBind region (greater than 200bp due to summit=FALSE) overlaps with the original 200bp peaks that this would add to the Called1 and Called2 tallies.

Is there something odd with my code, or a misunderstanding of the affect of using summits=FALSE?

Thank you,

sessionInfo( )
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.7 LTS

Matrix products: default
BLAS:   /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DiffBind_3.4.0              SummarizedExperiment_1.24.0 Biobase_2.54.0              MatrixGenerics_1.6.0        matrixStats_0.61.0          GenomicRanges_1.46.0       
 [7] GenomeInfoDb_1.30.0         IRanges_2.28.0              S4Vectors_0.32.0            BiocGenerics_0.40.0        

loaded via a namespace (and not attached):
  [1] amap_0.8-18              colorspace_2.0-2         rjson_0.2.20             hwriter_1.3.2            ellipsis_0.3.2           XVector_0.34.0          
  [7] rstudioapi_0.13          ggrepel_0.9.1            bit64_4.0.5              AnnotationDbi_1.56.1     fansi_0.5.0              mvtnorm_1.1-3           
 [13] apeglm_1.16.0            splines_4.1.2            cachem_1.0.6             geneplotter_1.72.0       knitr_1.36               jsonlite_1.7.2          
 [19] Rsamtools_2.10.0         annotate_1.72.0          ashr_2.2-47              png_0.1-7                GreyListChIP_1.26.0      compiler_4.1.2          
 [25] httr_1.4.2               assertthat_0.2.1         Matrix_1.3-4             fastmap_1.1.0            limma_3.50.0             cli_3.1.0               
 [31] htmltools_0.5.2          tools_4.1.2              coda_0.19-4              gtable_0.3.0             glue_1.4.2               GenomeInfoDbData_1.2.7  
 [37] systemPipeR_2.0.2        dplyr_1.0.7              ShortRead_1.52.0         Rcpp_1.0.7               bbmle_1.0.24             jquerylib_0.1.4         
 [43] vctrs_0.3.8              Biostrings_2.62.0        rtracklayer_1.54.0       xfun_0.27                stringr_1.4.0            lifecycle_1.0.1         
 [49] irlba_2.3.3              restfulr_0.0.13          gtools_3.9.2             XML_3.99-0.8             zlibbioc_1.40.0          MASS_7.3-54             
 [55] scales_1.1.1             BSgenome_1.62.0          RColorBrewer_1.1-2       yaml_2.2.1               memoise_2.0.0            ggplot2_3.3.5           
 [61] emdbook_1.3.12           bdsmatrix_1.3-4          latticeExtra_0.6-29      stringi_1.7.5            RSQLite_2.2.8            SQUAREM_2021.1          
 [67] genefilter_1.76.0        BiocIO_1.4.0             caTools_1.18.2           BiocParallel_1.28.0      truncnorm_1.0-8          rlang_0.4.12            
 [73] pkgconfig_2.0.3          bitops_1.0-7             evaluate_0.14            lattice_0.20-45          invgamma_1.1             purrr_0.3.4             
 [79] GenomicAlignments_1.30.0 htmlwidgets_1.5.4        bit_4.0.4                tidyselect_1.1.1         plyr_1.8.6               magrittr_2.0.1          
 [85] DESeq2_1.34.0            R6_2.5.1                 gplots_3.1.1             generics_0.1.1           DelayedArray_0.20.0      DBI_1.1.1               
 [91] pillar_1.6.4             survival_3.2-13          KEGGREST_1.34.0          RCurl_1.98-1.5           mixsqp_0.3-43            tibble_3.1.5            
 [97] crayon_1.4.2             KernSmooth_2.23-20       utf8_1.2.2               rmarkdown_2.11           jpeg_0.1-9               locfit_1.5-9.4          
[103] grid_4.1.2               blob_1.2.2               digest_0.6.28            xtable_1.8-4             numDeriv_2016.8-1.1      munsell_0.5.0
DiffBind • 264 views
Entering edit mode
Rory Stark ★ 4.2k
Last seen 19 hours ago
CRUK, Cambridge, UK

If you can send me a way to access your DBA objects, I can have a look to what what is happening. I'd need to see the DBA object before the call to dba.count()as well as the final version after the call to dba.analyze().

Entering edit mode

Can I send them to an email via Zendto, or via a shared Dropbox folder? Do you need an RDS or Rdata file? Thank you.

Entering edit mode

Yes, send an email with links to where I can download the saved objects to the maintainer email listed on the package page.

Optimally, you will save each object in a separate file using dba.save().

Entering edit mode

I jumped the gun and added RDS files to the Dropbox folder. I'll use dba.save().

Entering edit mode

I've got the data and have had a look. Can you give me examples of merged peaks where the Called1 and Called2 are not reporting correct values in the samples_qval_C1 report?

Entering edit mode

Hi Rory:

Hope you are well.

Can you help me out with this problem that I am facing with Diffbind?

Not able to replicated Diffbind example dataset.

Entering edit mode

Thank you for looking. I have put pictures of the following regions in UCSC browser in the Dropbox folder that are focused on the regions below (zoomed out 1.5x to give context). The number next to the 200bp summit region is the qvalue from MACS2. The pictures show the original MACS2 peaks and the 200bp regions centred upon the MACS2 summit that are used as the input to DiffBind.

Ex1 chr12 6084297 6085025 - Called1 = 0, Called2 = 0, but each has an input peak.

Ex2 chr12 47166002 47166241 - Called1 = 2, Called2 = 3, but NP2 and NP19 had no summit region.

Ex3 chr4 42360321 42360584 - Called1 = 1, Called2 = 0, but NP1, 2 and 13 had input peaks

I had given you the link to the full UCSC session in a file, in the Dropbox folder, called 'UCSC_link_17Jan22.txt'.

Thank you, Ian

Entering edit mode

Thank you for helping me track own this bug. I have completed a fix that will appear in the next few days as DiffBind_3.4.5.

Entering edit mode

Glad to help. Thank you for your effort!

Does this mean I am using DiffBind in an unusual way? Are you able to explain what the problem was? I am wondering whether it is just the Called1/Called2 numbers that are wrongly reported or whether it also affects the other statistics? I.e. should I rerun previous analyses?


Login before adding your answer.

Traffic: 483 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6