More regions in union when increasing DBA$config$mergeOverlap
Entering edit mode
Ian D. ▴ 70
Last seen 16 hours ago
United Kingdom


Recently I have been asked to run DiffBind with an increased minimum overlap between input regions. I have tested the different between using config$mergeOverlap <- 1 (the default) and config$mergeOverlap <- 60 (which equates to 30pc of the length of my 200bp input regions). However, the sites in matrix is greater for the more stringent 30% overlap (see code extract below). This seems wrong to me, or am I misunderstanding the usage of config$mergeOverlap or the output message?

Thank you,

# load uniformly 200bp regions and use default overlap of 1bp
# same result if config$mergeOverlap is left unchanged from the default
samples_qval_1bp <- dba(sampleSheet="sample_sheet_qval.csv")
samples_qval_1bp$config$mergeOverlap <- 1
samples_qval_1bp <- dba(samples_qval_1bp)

# result
9 Samples, 134545 sites in matrix (183389 total):

# load uniformly 200bp regions and use default overlap of 60bp (30% of input region length)
samples_qval_30pc <- dba(sampleSheet="sample_sheet_qval.csv")
samples_qval_30pc$config$mergeOverlap <- 60
samples_qval_30pc <- dba(samples_qval_30pc)

# result
9 Samples, 139882 sites in matrix (202656 total):

sessionInfo( )
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/

 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DiffBind_3.4.11             SummarizedExperiment_1.24.0 Biobase_2.54.0              MatrixGenerics_1.6.0        matrixStats_0.61.0         
 [6] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1         IRanges_2.28.0              S4Vectors_0.32.3            BiocGenerics_0.40.0        

loaded via a namespace (and not attached):
 [1] bitops_1.0-7             RColorBrewer_1.1-2       numDeriv_2016.8-1.1      tools_4.1.2              bslib_0.4.2              irlba_2.3.5             
 [7] utf8_1.2.2               R6_2.5.1                 KernSmooth_2.23-20       DBI_1.1.2                colorspace_2.0-2         apeglm_1.16.0           
[13] tidyselect_1.1.1         compiler_4.1.2           cli_3.1.1                DelayedArray_0.20.0      rtracklayer_1.54.0       sass_0.4.5              
[19] caTools_1.18.2           scales_1.1.1             SQUAREM_2021.1           mvtnorm_1.1-3            mixsqp_0.3-43            stringr_1.4.0           
[25] digest_0.6.29            Rsamtools_2.10.0         rmarkdown_2.11           XVector_0.34.0           jpeg_0.1-9               pkgconfig_2.0.3         
[31] htmltools_0.5.5          BSgenome_1.62.0          invgamma_1.1             bbmle_1.0.24             fastmap_1.1.0            limma_3.50.1            
[37] htmlwidgets_1.5.4        rlang_1.1.0              rstudioapi_0.13          BiocIO_1.4.0             jquerylib_0.1.4          generics_0.1.2          
[43] hwriter_1.3.2            jsonlite_1.7.3           BiocParallel_1.28.3      gtools_3.9.2             dplyr_1.0.7              RCurl_1.98-1.5          
[49] magrittr_2.0.2           GenomeInfoDbData_1.2.7   Matrix_1.4-0             Rcpp_1.0.8               munsell_0.5.0            fansi_1.0.2             
[55] lifecycle_1.0.1          stringi_1.7.6            yaml_2.2.2               MASS_7.3-55              zlibbioc_1.40.0          plyr_1.8.6              
[61] gplots_3.1.1             grid_4.1.2               ggrepel_0.9.1            bdsmatrix_1.3-4          crayon_1.4.2             lattice_0.20-45         
[67] Biostrings_2.62.0        locfit_1.5-9.4           knitr_1.37               pillar_1.7.0             rjson_0.2.21             systemPipeR_2.0.5       
[73] XML_3.99-0.8             glue_1.6.1               evaluate_0.14            ShortRead_1.52.0         GreyListChIP_1.26.0      latticeExtra_0.6-29     
[79] vctrs_0.3.8              png_0.1-7                gtable_0.3.0             purrr_0.3.4              amap_0.8-18              assertthat_0.2.1        
[85] ashr_2.2-54              cachem_1.0.6             ggplot2_3.3.5            emdbook_1.3.12           xfun_0.29                restfulr_0.0.13         
[91] coda_0.19-4              truncnorm_1.0-8          tibble_3.1.6             GenomicAlignments_1.30.0 ellipsis_0.3.2
DiffBind • 543 views
Entering edit mode
Rory Stark ★ 5.2k
Last seen 7 weeks ago
Cambridge, UK

I think the documentation is out of sync with the code. Positive values represent gaps between intervals and negative values represent how much intervals overlap, with the default value of -1 resulting in megring peaks that overlap byt at least 1 basepair. So you should set the value to -60, not 60. I'll see about fixing the documentation.

Entering edit mode

Hello. Thanks for the reply.

Sorry about this. My testing is telling me that 1 is the default (a gap), not -1 (1bp overlap). Negative numbers are decreasing as expected, but I am unsure about what the 1bp gap as a default represents. Does a default of mergeOverlap = 1 mean that regions will be merged if they overlap or are up to 1bp away from each other?

Thank you, Ian

This is the default where config$mergeOverlap = NULL (how it is without me changing anything)
9 Samples, 134545 sites in matrix (183389 total)

samples_qval <- dba(sampleSheet="sample_sheet_qval.csv", minOverlap=2)

This is where config$mergeOverlap = 1.
9 Samples, 134545 sites in matrix (183389 total) = same as above.

samples_qval_1bp <- dba(sampleSheet="sample_sheet_qval.csv", minOverlap=2)
samples_qval_1bp$config$mergeOverlap <- 1
samples_qval_1bp <- dba(samples_qval_1bp)

This is where config$mergeOverlap = -1.
9 Samples, 134406 sites in matrix (182919 total):

samples_qval_m1bp <- dba(sampleSheet="sample_sheet_qval.csv", minOverlap=2)
samples_qval_m1bp$config$mergeOverlap <- -1
samples_qval_m1bp <- dba(samples_qval_m1bp)

This is where config$mergeOverlap = -60.
9 Samples, 130414 sites in matrix (171033 total)

samples_qval_30pc <- dba(sampleSheet="sample_sheet_qval.csv", minOverlap=2)
samples_qval_30pc$config$mergeOverlap <- -60
samples_qval_30pc <- dba(samples_qval_30pc)
Entering edit mode

Nudging my question. I could do with confirmation before proceeding with my current projects. Thank you.

Entering edit mode

I am so confused and frustrated learning to use this software. Isn't how you merge the peaksets one of the most important steps for the comparative analysis? Default 1 bp overlap doesn't make any sense. The 1-bp overlapped peaks could be completely different TF binding sites. Now you added that the negative value is actually for merging overlapped peaks, which contradicts the published documentation. I guess I will seek other analysis pipelines.

Entering edit mode

Please could you comment on this Rory Stark

Entering edit mode

Just nudging this @rory. Just would like to know if I am correct on this or not thanks.


Login before adding your answer.

Traffic: 372 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6