Question

DiffBind: Values of Called1 and Called2 do not appear to coincide with input peaks

0

Entering edit mode

Ian • 0

@ian-23882

Last seen 3.8 years ago

United Kingdom

Hi,

I have run DiffBind where the input is 200bp regions centred upon MACS2 summits run using --call-summits. I opted to run dba.count with summits=FALSE.

samples_qval <- dba(sampleSheet="sample_sheet_qval.csv")
samples_qval <- dba.count(samples_qval, minOverlap=2, summits=FALSE, bParallel = FALSE)
samples_qval <- dba.normalize(samples_qval)
samples_qval <- dba.contrast(samples_qval, minMembers=2, categories=DBA_CONDITION)
samples_qval <- dba.analyze(samples_qval, bParallel=FALSE)
samples_qval_C1 <- dba.report(samples_qval, contrast=1, bCalled=TRUE, bCounts=TRUE, bCalledDetail = TRUE, bFlip = FALSE)

However, the values in the Called1 and Called2 columns do not appear to correspond to the original peaks. I would expect that if the new DiffBind region (greater than 200bp due to summit=FALSE) overlaps with the original 200bp peaks that this would add to the Called1 and Called2 tallies.

Is there something odd with my code, or a misunderstanding of the affect of using summits=FALSE?

Thank you,
Ian

sessionInfo( )
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.7 LTS

Matrix products: default
BLAS:   /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DiffBind_3.4.0              SummarizedExperiment_1.24.0 Biobase_2.54.0              MatrixGenerics_1.6.0        matrixStats_0.61.0          GenomicRanges_1.46.0       
 [7] GenomeInfoDb_1.30.0         IRanges_2.28.0              S4Vectors_0.32.0            BiocGenerics_0.40.0        

loaded via a namespace (and not attached):
  [1] amap_0.8-18              colorspace_2.0-2         rjson_0.2.20             hwriter_1.3.2            ellipsis_0.3.2           XVector_0.34.0          
  [7] rstudioapi_0.13          ggrepel_0.9.1            bit64_4.0.5              AnnotationDbi_1.56.1     fansi_0.5.0              mvtnorm_1.1-3           
 [13] apeglm_1.16.0            splines_4.1.2            cachem_1.0.6             geneplotter_1.72.0       knitr_1.36               jsonlite_1.7.2          
 [19] Rsamtools_2.10.0         annotate_1.72.0          ashr_2.2-47              png_0.1-7                GreyListChIP_1.26.0      compiler_4.1.2          
 [25] httr_1.4.2               assertthat_0.2.1         Matrix_1.3-4             fastmap_1.1.0            limma_3.50.0             cli_3.1.0               
 [31] htmltools_0.5.2          tools_4.1.2              coda_0.19-4              gtable_0.3.0             glue_1.4.2               GenomeInfoDbData_1.2.7  
 [37] systemPipeR_2.0.2        dplyr_1.0.7              ShortRead_1.52.0         Rcpp_1.0.7               bbmle_1.0.24             jquerylib_0.1.4         
 [43] vctrs_0.3.8              Biostrings_2.62.0        rtracklayer_1.54.0       xfun_0.27                stringr_1.4.0            lifecycle_1.0.1         
 [49] irlba_2.3.3              restfulr_0.0.13          gtools_3.9.2             XML_3.99-0.8             zlibbioc_1.40.0          MASS_7.3-54             
 [55] scales_1.1.1             BSgenome_1.62.0          RColorBrewer_1.1-2       yaml_2.2.1               memoise_2.0.0            ggplot2_3.3.5           
 [61] emdbook_1.3.12           bdsmatrix_1.3-4          latticeExtra_0.6-29      stringi_1.7.5            RSQLite_2.2.8            SQUAREM_2021.1          
 [67] genefilter_1.76.0        BiocIO_1.4.0             caTools_1.18.2           BiocParallel_1.28.0      truncnorm_1.0-8          rlang_0.4.12            
 [73] pkgconfig_2.0.3          bitops_1.0-7             evaluate_0.14            lattice_0.20-45          invgamma_1.1             purrr_0.3.4             
 [79] GenomicAlignments_1.30.0 htmlwidgets_1.5.4        bit_4.0.4                tidyselect_1.1.1         plyr_1.8.6               magrittr_2.0.1          
 [85] DESeq2_1.34.0            R6_2.5.1                 gplots_3.1.1             generics_0.1.1           DelayedArray_0.20.0      DBI_1.1.1               
 [91] pillar_1.6.4             survival_3.2-13          KEGGREST_1.34.0          RCurl_1.98-1.5           mixsqp_0.3-43            tibble_3.1.5            
 [97] crayon_1.4.2             KernSmooth_2.23-20       utf8_1.2.2               rmarkdown_2.11           jpeg_0.1-9               locfit_1.5-9.4          
[103] grid_4.1.2               blob_1.2.2               digest_0.6.28            xtable_1.8-4             numDeriv_2016.8-1.1      munsell_0.5.0

DiffBind • 2.4k views

ADD COMMENT • link updated 4.0 years ago by Rory Stark ★ 5.2k • written 4.0 years ago by Ian • 0

score 0 · Answer 1 · 2022-01-12

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 12 months ago

Cambridge, UK

If you can send me a way to access your DBA objects, I can have a look to what what is happening. I'd need to see the DBA object before the call to dba.count()as well as the final version after the call to dba.analyze().

ADD COMMENT • link 4.0 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Can I send them to an email via Zendto, or via a shared Dropbox folder? Do you need an RDS or Rdata file? Thank you.

ADD REPLY • link 4.0 years ago Ian • 0

0

Entering edit mode

Yes, send an email with links to where I can download the saved objects to the maintainer email listed on the package page.

Optimally, you will save each object in a separate file using dba.save().

ADD REPLY • link 4.0 years ago Rory Stark ★ 5.2k

0

Entering edit mode

I jumped the gun and added RDS files to the Dropbox folder. I'll use dba.save().

ADD REPLY • link 4.0 years ago Ian • 0

0

Entering edit mode

I've got the data and have had a look. Can you give me examples of merged peaks where the Called1 and Called2 are not reporting correct values in the samples_qval_C1 report?

ADD REPLY • link 4.0 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Hi Rory:

Hope you are well.

Can you help me out with this problem that I am facing with Diffbind?

Not able to replicated Diffbind example dataset.

ADD REPLY • link 4.0 years ago Ayyappa ▴ 10

0

Entering edit mode

Thank you for looking. I have put pictures of the following regions in UCSC browser in the Dropbox folder that are focused on the regions below (zoomed out 1.5x to give context). The number next to the 200bp summit region is the qvalue from MACS2. The pictures show the original MACS2 peaks and the 200bp regions centred upon the MACS2 summit that are used as the input to DiffBind.

Ex1 chr12 6084297 6085025 - Called1 = 0, Called2 = 0, but each has an input peak.

Ex2 chr12 47166002 47166241 - Called1 = 2, Called2 = 3, but NP2 and NP19 had no summit region.

Ex3 chr4 42360321 42360584 - Called1 = 1, Called2 = 0, but NP1, 2 and 13 had input peaks

I had given you the link to the full UCSC session in a file, in the Dropbox folder, called 'UCSC_link_17Jan22.txt'.

Thank you, Ian

ADD REPLY • link 4.0 years ago Ian • 0

1

Entering edit mode

Thank you for helping me track own this bug. I have completed a fix that will appear in the next few days as DiffBind_3.4.5.

ADD REPLY • link 4.0 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Glad to help. Thank you for your effort!

Does this mean I am using DiffBind in an unusual way? Are you able to explain what the problem was? I am wondering whether it is just the Called1/Called2 numbers that are wrongly reported or whether it also affects the other statistics? I.e. should I rerun previous analyses?

ADD REPLY • link 4.0 years ago Ian • 0

0

Entering edit mode

Nothing too unusual, just a bug where in certain conditions the Called columns were being computed using the unmerged datasets.

This only affects the Called columns in a report returned by dba.report(). I did extensive testing to make sure it did not impact other operations such as forming consensus peaksets, computing overlaps, and generating Venn diagrams. No calculations or reporting of any statistics are impacted.

Unfortunately, you should re-run any previous analyses if you were relying on the Called1 and Called2 columns. Some of these may be incorrect depending on the exact sequence of commands you used to generate the analysis object.

If you re-load an older DBA object with this issue, it will repaired if possible. If not, you'll see a warning if you try to generate a report with any of the Called columns, and the report will be generated without those columns present, from version 3.4.6. (With the objects you sent me, the "pre" one can be repaired but the "post" one can not.)

ADD REPLY • link 4.0 years ago Rory Stark ★ 5.2k