Hi,
I have run DiffBind where the input is 200bp regions centred upon MACS2 summits run using --call-summits. I opted to run dba.count with summits=FALSE.
samples_qval <- dba(sampleSheet="sample_sheet_qval.csv")
samples_qval <- dba.count(samples_qval, minOverlap=2, summits=FALSE, bParallel = FALSE)
samples_qval <- dba.normalize(samples_qval)
samples_qval <- dba.contrast(samples_qval, minMembers=2, categories=DBA_CONDITION)
samples_qval <- dba.analyze(samples_qval, bParallel=FALSE)
samples_qval_C1 <- dba.report(samples_qval, contrast=1, bCalled=TRUE, bCounts=TRUE, bCalledDetail = TRUE, bFlip = FALSE)
However, the values in the Called1 and Called2 columns do not appear to correspond to the original peaks. I would expect that if the new DiffBind region (greater than 200bp due to summit=FALSE) overlaps with the original 200bp peaks that this would add to the Called1 and Called2 tallies.
Is there something odd with my code, or a misunderstanding of the affect of using summits=FALSE?
Thank you,
Ian
sessionInfo( )
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.7 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] DiffBind_3.4.0 SummarizedExperiment_1.24.0 Biobase_2.54.0 MatrixGenerics_1.6.0 matrixStats_0.61.0 GenomicRanges_1.46.0
[7] GenomeInfoDb_1.30.0 IRanges_2.28.0 S4Vectors_0.32.0 BiocGenerics_0.40.0
loaded via a namespace (and not attached):
[1] amap_0.8-18 colorspace_2.0-2 rjson_0.2.20 hwriter_1.3.2 ellipsis_0.3.2 XVector_0.34.0
[7] rstudioapi_0.13 ggrepel_0.9.1 bit64_4.0.5 AnnotationDbi_1.56.1 fansi_0.5.0 mvtnorm_1.1-3
[13] apeglm_1.16.0 splines_4.1.2 cachem_1.0.6 geneplotter_1.72.0 knitr_1.36 jsonlite_1.7.2
[19] Rsamtools_2.10.0 annotate_1.72.0 ashr_2.2-47 png_0.1-7 GreyListChIP_1.26.0 compiler_4.1.2
[25] httr_1.4.2 assertthat_0.2.1 Matrix_1.3-4 fastmap_1.1.0 limma_3.50.0 cli_3.1.0
[31] htmltools_0.5.2 tools_4.1.2 coda_0.19-4 gtable_0.3.0 glue_1.4.2 GenomeInfoDbData_1.2.7
[37] systemPipeR_2.0.2 dplyr_1.0.7 ShortRead_1.52.0 Rcpp_1.0.7 bbmle_1.0.24 jquerylib_0.1.4
[43] vctrs_0.3.8 Biostrings_2.62.0 rtracklayer_1.54.0 xfun_0.27 stringr_1.4.0 lifecycle_1.0.1
[49] irlba_2.3.3 restfulr_0.0.13 gtools_3.9.2 XML_3.99-0.8 zlibbioc_1.40.0 MASS_7.3-54
[55] scales_1.1.1 BSgenome_1.62.0 RColorBrewer_1.1-2 yaml_2.2.1 memoise_2.0.0 ggplot2_3.3.5
[61] emdbook_1.3.12 bdsmatrix_1.3-4 latticeExtra_0.6-29 stringi_1.7.5 RSQLite_2.2.8 SQUAREM_2021.1
[67] genefilter_1.76.0 BiocIO_1.4.0 caTools_1.18.2 BiocParallel_1.28.0 truncnorm_1.0-8 rlang_0.4.12
[73] pkgconfig_2.0.3 bitops_1.0-7 evaluate_0.14 lattice_0.20-45 invgamma_1.1 purrr_0.3.4
[79] GenomicAlignments_1.30.0 htmlwidgets_1.5.4 bit_4.0.4 tidyselect_1.1.1 plyr_1.8.6 magrittr_2.0.1
[85] DESeq2_1.34.0 R6_2.5.1 gplots_3.1.1 generics_0.1.1 DelayedArray_0.20.0 DBI_1.1.1
[91] pillar_1.6.4 survival_3.2-13 KEGGREST_1.34.0 RCurl_1.98-1.5 mixsqp_0.3-43 tibble_3.1.5
[97] crayon_1.4.2 KernSmooth_2.23-20 utf8_1.2.2 rmarkdown_2.11 jpeg_0.1-9 locfit_1.5-9.4
[103] grid_4.1.2 blob_1.2.2 digest_0.6.28 xtable_1.8-4 numDeriv_2016.8-1.1 munsell_0.5.0
Can I send them to an email via Zendto, or via a shared Dropbox folder? Do you need an RDS or Rdata file? Thank you.
Yes, send an email with links to where I can download the saved objects to the maintainer email listed on the package page.
Optimally, you will save each object in a separate file using
dba.save()
.I jumped the gun and added RDS files to the Dropbox folder. I'll use dba.save().
I've got the data and have had a look. Can you give me examples of merged peaks where the Called1 and Called2 are not reporting correct values in the
samples_qval_C1
report?Hi Rory:
Hope you are well.
Can you help me out with this problem that I am facing with Diffbind?
Not able to replicated Diffbind example dataset.
Thank you for looking. I have put pictures of the following regions in UCSC browser in the Dropbox folder that are focused on the regions below (zoomed out 1.5x to give context). The number next to the 200bp summit region is the qvalue from MACS2. The pictures show the original MACS2 peaks and the 200bp regions centred upon the MACS2 summit that are used as the input to DiffBind.
Ex1 chr12 6084297 6085025 - Called1 = 0, Called2 = 0, but each has an input peak.
Ex2 chr12 47166002 47166241 - Called1 = 2, Called2 = 3, but NP2 and NP19 had no summit region.
Ex3 chr4 42360321 42360584 - Called1 = 1, Called2 = 0, but NP1, 2 and 13 had input peaks
I had given you the link to the full UCSC session in a file, in the Dropbox folder, called 'UCSC_link_17Jan22.txt'.
Thank you, Ian
Thank you for helping me track own this bug. I have completed a fix that will appear in the next few days as
DiffBind_3.4.5
.Glad to help. Thank you for your effort!
Does this mean I am using DiffBind in an unusual way? Are you able to explain what the problem was? I am wondering whether it is just the Called1/Called2 numbers that are wrongly reported or whether it also affects the other statistics? I.e. should I rerun previous analyses?
Nothing too unusual, just a bug where in certain conditions the
Called
columns were being computed using the unmerged datasets.This only affects the
Called
columns in a report returned bydba.report()
. I did extensive testing to make sure it did not impact other operations such as forming consensus peaksets, computing overlaps, and generating Venn diagrams. No calculations or reporting of any statistics are impacted.Unfortunately, you should re-run any previous analyses if you were relying on the
Called1
andCalled2
columns. Some of these may be incorrect depending on the exact sequence of commands you used to generate the analysis object.If you re-load an older DBA object with this issue, it will repaired if possible. If not, you'll see a warning if you try to generate a report with any of the
Called
columns, and the report will be generated without those columns present, from version 3.4.6. (With the objects you sent me, the "pre" one can be repaired but the "post" one can not.)