Hello,
Recently I have been asked to run DiffBind with an increased minimum overlap between input regions. I have tested the different between using config$mergeOverlap <- 1 (the default) and config$mergeOverlap <- 60 (which equates to 30pc of the length of my 200bp input regions). However, the sites in matrix is greater for the more stringent 30% overlap (see code extract below). This seems wrong to me, or am I misunderstanding the usage of config$mergeOverlap or the output message?
Thank you,
Ian
# load uniformly 200bp regions and use default overlap of 1bp
# same result if config$mergeOverlap is left unchanged from the default
samples_qval_1bp <- dba(sampleSheet="sample_sheet_qval.csv")
samples_qval_1bp$config$mergeOverlap <- 1
samples_qval_1bp <- dba(samples_qval_1bp)
samples_qval_1bp
# result
9 Samples, 134545 sites in matrix (183389 total):
# load uniformly 200bp regions and use default overlap of 60bp (30% of input region length)
samples_qval_30pc <- dba(sampleSheet="sample_sheet_qval.csv")
samples_qval_30pc$config$mergeOverlap <- 60
samples_qval_30pc <- dba(samples_qval_30pc)
samples_qval_30pc
# result
9 Samples, 139882 sites in matrix (202656 total):
sessionInfo( )
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] DiffBind_3.4.11 SummarizedExperiment_1.24.0 Biobase_2.54.0 MatrixGenerics_1.6.0 matrixStats_0.61.0
[6] GenomicRanges_1.46.1 GenomeInfoDb_1.30.1 IRanges_2.28.0 S4Vectors_0.32.3 BiocGenerics_0.40.0
loaded via a namespace (and not attached):
[1] bitops_1.0-7 RColorBrewer_1.1-2 numDeriv_2016.8-1.1 tools_4.1.2 bslib_0.4.2 irlba_2.3.5
[7] utf8_1.2.2 R6_2.5.1 KernSmooth_2.23-20 DBI_1.1.2 colorspace_2.0-2 apeglm_1.16.0
[13] tidyselect_1.1.1 compiler_4.1.2 cli_3.1.1 DelayedArray_0.20.0 rtracklayer_1.54.0 sass_0.4.5
[19] caTools_1.18.2 scales_1.1.1 SQUAREM_2021.1 mvtnorm_1.1-3 mixsqp_0.3-43 stringr_1.4.0
[25] digest_0.6.29 Rsamtools_2.10.0 rmarkdown_2.11 XVector_0.34.0 jpeg_0.1-9 pkgconfig_2.0.3
[31] htmltools_0.5.5 BSgenome_1.62.0 invgamma_1.1 bbmle_1.0.24 fastmap_1.1.0 limma_3.50.1
[37] htmlwidgets_1.5.4 rlang_1.1.0 rstudioapi_0.13 BiocIO_1.4.0 jquerylib_0.1.4 generics_0.1.2
[43] hwriter_1.3.2 jsonlite_1.7.3 BiocParallel_1.28.3 gtools_3.9.2 dplyr_1.0.7 RCurl_1.98-1.5
[49] magrittr_2.0.2 GenomeInfoDbData_1.2.7 Matrix_1.4-0 Rcpp_1.0.8 munsell_0.5.0 fansi_1.0.2
[55] lifecycle_1.0.1 stringi_1.7.6 yaml_2.2.2 MASS_7.3-55 zlibbioc_1.40.0 plyr_1.8.6
[61] gplots_3.1.1 grid_4.1.2 ggrepel_0.9.1 bdsmatrix_1.3-4 crayon_1.4.2 lattice_0.20-45
[67] Biostrings_2.62.0 locfit_1.5-9.4 knitr_1.37 pillar_1.7.0 rjson_0.2.21 systemPipeR_2.0.5
[73] XML_3.99-0.8 glue_1.6.1 evaluate_0.14 ShortRead_1.52.0 GreyListChIP_1.26.0 latticeExtra_0.6-29
[79] vctrs_0.3.8 png_0.1-7 gtable_0.3.0 purrr_0.3.4 amap_0.8-18 assertthat_0.2.1
[85] ashr_2.2-54 cachem_1.0.6 ggplot2_3.3.5 emdbook_1.3.12 xfun_0.29 restfulr_0.0.13
[91] coda_0.19-4 truncnorm_1.0-8 tibble_3.1.6 GenomicAlignments_1.30.0 ellipsis_0.3.2
Hello. Thanks for the reply.
Sorry about this. My testing is telling me that 1 is the default (a gap), not -1 (1bp overlap). Negative numbers are decreasing as expected, but I am unsure about what the 1bp gap as a default represents. Does a default of mergeOverlap = 1 mean that regions will be merged if they overlap or are up to 1bp away from each other?
Thank you, Ian
This is the default where config$mergeOverlap = NULL (how it is without me changing anything)
9 Samples, 134545 sites in matrix (183389 total)
This is where config$mergeOverlap = 1.
9 Samples, 134545 sites in matrix (183389 total) = same as above.
This is where config$mergeOverlap = -1.
9 Samples, 134406 sites in matrix (182919 total):
This is where config$mergeOverlap = -60.
9 Samples, 130414 sites in matrix (171033 total)
Nudging my question. I could do with confirmation before proceeding with my current projects. Thank you.
I am so confused and frustrated learning to use this software. Isn't how you merge the peaksets one of the most important steps for the comparative analysis? Default 1 bp overlap doesn't make any sense. The 1-bp overlapped peaks could be completely different TF binding sites. Now you added that the negative value is actually for merging overlapped peaks, which contradicts the published documentation. I guess I will seek other analysis pipelines.
Please could you comment on this Rory Stark
Just nudging this @rory. Just would like to know if I am correct on this or not thanks.
Just nudging this Rory Stark
Rory Stark Please can you confirm whether the default criteria for the merging of regions in the creation of a union of peaks is a 1bp overlap, or a 1bp space? Thanks you.
Hi @rory. Please can you comment on my query. I could be completely wrong, but would like to know one way or the other. Thanks, Ian