More regions in union when increasing DBA$config$mergeOverlap
1
0
Entering edit mode
Ian D. ▴ 70
@ian-donaldson-4761
Last seen 8 hours ago
United Kingdom

Hello,

Recently I have been asked to run DiffBind with an increased minimum overlap between input regions. I have tested the different between using config$mergeOverlap <- 1 (the default) and config$mergeOverlap <- 60 (which equates to 30pc of the length of my 200bp input regions). However, the sites in matrix is greater for the more stringent 30% overlap (see code extract below). This seems wrong to me, or am I misunderstanding the usage of config$mergeOverlap or the output message?

Thank you,
Ian

# load uniformly 200bp regions and use default overlap of 1bp
# same result if config$mergeOverlap is left unchanged from the default
samples_qval_1bp <- dba(sampleSheet="sample_sheet_qval.csv")
samples_qval_1bp$config$mergeOverlap <- 1
samples_qval_1bp <- dba(samples_qval_1bp)
samples_qval_1bp

# result
9 Samples, 134545 sites in matrix (183389 total):


# load uniformly 200bp regions and use default overlap of 60bp (30% of input region length)
samples_qval_30pc <- dba(sampleSheet="sample_sheet_qval.csv")
samples_qval_30pc$config$mergeOverlap <- 60
samples_qval_30pc <- dba(samples_qval_30pc)
samples_qval_30pc

# result
9 Samples, 139882 sites in matrix (202656 total):


sessionInfo( )
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DiffBind_3.4.11             SummarizedExperiment_1.24.0 Biobase_2.54.0              MatrixGenerics_1.6.0        matrixStats_0.61.0         
 [6] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1         IRanges_2.28.0              S4Vectors_0.32.3            BiocGenerics_0.40.0        

loaded via a namespace (and not attached):
 [1] bitops_1.0-7             RColorBrewer_1.1-2       numDeriv_2016.8-1.1      tools_4.1.2              bslib_0.4.2              irlba_2.3.5             
 [7] utf8_1.2.2               R6_2.5.1                 KernSmooth_2.23-20       DBI_1.1.2                colorspace_2.0-2         apeglm_1.16.0           
[13] tidyselect_1.1.1         compiler_4.1.2           cli_3.1.1                DelayedArray_0.20.0      rtracklayer_1.54.0       sass_0.4.5              
[19] caTools_1.18.2           scales_1.1.1             SQUAREM_2021.1           mvtnorm_1.1-3            mixsqp_0.3-43            stringr_1.4.0           
[25] digest_0.6.29            Rsamtools_2.10.0         rmarkdown_2.11           XVector_0.34.0           jpeg_0.1-9               pkgconfig_2.0.3         
[31] htmltools_0.5.5          BSgenome_1.62.0          invgamma_1.1             bbmle_1.0.24             fastmap_1.1.0            limma_3.50.1            
[37] htmlwidgets_1.5.4        rlang_1.1.0              rstudioapi_0.13          BiocIO_1.4.0             jquerylib_0.1.4          generics_0.1.2          
[43] hwriter_1.3.2            jsonlite_1.7.3           BiocParallel_1.28.3      gtools_3.9.2             dplyr_1.0.7              RCurl_1.98-1.5          
[49] magrittr_2.0.2           GenomeInfoDbData_1.2.7   Matrix_1.4-0             Rcpp_1.0.8               munsell_0.5.0            fansi_1.0.2             
[55] lifecycle_1.0.1          stringi_1.7.6            yaml_2.2.2               MASS_7.3-55              zlibbioc_1.40.0          plyr_1.8.6              
[61] gplots_3.1.1             grid_4.1.2               ggrepel_0.9.1            bdsmatrix_1.3-4          crayon_1.4.2             lattice_0.20-45         
[67] Biostrings_2.62.0        locfit_1.5-9.4           knitr_1.37               pillar_1.7.0             rjson_0.2.21             systemPipeR_2.0.5       
[73] XML_3.99-0.8             glue_1.6.1               evaluate_0.14            ShortRead_1.52.0         GreyListChIP_1.26.0      latticeExtra_0.6-29     
[79] vctrs_0.3.8              png_0.1-7                gtable_0.3.0             purrr_0.3.4              amap_0.8-18              assertthat_0.2.1        
[85] ashr_2.2-54              cachem_1.0.6             ggplot2_3.3.5            emdbook_1.3.12           xfun_0.29                restfulr_0.0.13         
[91] coda_0.19-4              truncnorm_1.0-8          tibble_3.1.6             GenomicAlignments_1.30.0 ellipsis_0.3.2
DiffBind • 350 views
ADD COMMENT
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 13 days ago
Cambridge, UK

I think the documentation is out of sync with the code. Positive values represent gaps between intervals and negative values represent how much intervals overlap, with the default value of -1 resulting in megring peaks that overlap byt at least 1 basepair. So you should set the value to -60, not 60. I'll see about fixing the documentation.

ADD COMMENT
0
Entering edit mode

Hello. Thanks for the reply.

Sorry about this. My testing is telling me that 1 is the default (a gap), not -1 (1bp overlap). Negative numbers are decreasing as expected, but I am unsure about what the 1bp gap as a default represents. Does a default of mergeOverlap = 1 mean that regions will be merged if they overlap or are up to 1bp away from each other?

Thank you, Ian

This is the default where config$mergeOverlap = NULL (how it is without me changing anything)
9 Samples, 134545 sites in matrix (183389 total)

samples_qval <- dba(sampleSheet="sample_sheet_qval.csv", minOverlap=2)

This is where config$mergeOverlap = 1.
9 Samples, 134545 sites in matrix (183389 total) = same as above.

samples_qval_1bp <- dba(sampleSheet="sample_sheet_qval.csv", minOverlap=2)
samples_qval_1bp$config$mergeOverlap <- 1
samples_qval_1bp <- dba(samples_qval_1bp)

This is where config$mergeOverlap = -1.
9 Samples, 134406 sites in matrix (182919 total):

samples_qval_m1bp <- dba(sampleSheet="sample_sheet_qval.csv", minOverlap=2)
samples_qval_m1bp$config$mergeOverlap <- -1
samples_qval_m1bp <- dba(samples_qval_m1bp)

This is where config$mergeOverlap = -60.
9 Samples, 130414 sites in matrix (171033 total)

samples_qval_30pc <- dba(sampleSheet="sample_sheet_qval.csv", minOverlap=2)
samples_qval_30pc$config$mergeOverlap <- -60
samples_qval_30pc <- dba(samples_qval_30pc)
ADD REPLY
0
Entering edit mode

Nudging my question. I could do with confirmation before proceeding with my current projects. Thank you.

ADD REPLY
0
Entering edit mode

I am so confused and frustrated learning to use this software. Isn't how you merge the peaksets one of the most important steps for the comparative analysis? Default 1 bp overlap doesn't make any sense. The 1-bp overlapped peaks could be completely different TF binding sites. Now you added that the negative value is actually for merging overlapped peaks, which contradicts the published documentation. I guess I will seek other analysis pipelines.

ADD REPLY

Login before adding your answer.

Traffic: 936 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6