DiffBind: Values of Called1 and Called2 do not appear to coincide with input peaks
1
0
Entering edit mode
Ian • 0
@ian-23882
Last seen 2.7 years ago
United Kingdom

Hi,

I have run DiffBind where the input is 200bp regions centred upon MACS2 summits run using --call-summits. I opted to run dba.count with summits=FALSE.

samples_qval <- dba(sampleSheet="sample_sheet_qval.csv")
samples_qval <- dba.count(samples_qval, minOverlap=2, summits=FALSE, bParallel = FALSE)
samples_qval <- dba.normalize(samples_qval)
samples_qval <- dba.contrast(samples_qval, minMembers=2, categories=DBA_CONDITION)
samples_qval <- dba.analyze(samples_qval, bParallel=FALSE)
samples_qval_C1 <- dba.report(samples_qval, contrast=1, bCalled=TRUE, bCounts=TRUE, bCalledDetail = TRUE, bFlip = FALSE)

However, the values in the Called1 and Called2 columns do not appear to correspond to the original peaks. I would expect that if the new DiffBind region (greater than 200bp due to summit=FALSE) overlaps with the original 200bp peaks that this would add to the Called1 and Called2 tallies.

Is there something odd with my code, or a misunderstanding of the affect of using summits=FALSE?

Thank you,
Ian

sessionInfo( )
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.7 LTS

Matrix products: default
BLAS:   /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DiffBind_3.4.0              SummarizedExperiment_1.24.0 Biobase_2.54.0              MatrixGenerics_1.6.0        matrixStats_0.61.0          GenomicRanges_1.46.0       
 [7] GenomeInfoDb_1.30.0         IRanges_2.28.0              S4Vectors_0.32.0            BiocGenerics_0.40.0        

loaded via a namespace (and not attached):
  [1] amap_0.8-18              colorspace_2.0-2         rjson_0.2.20             hwriter_1.3.2            ellipsis_0.3.2           XVector_0.34.0          
  [7] rstudioapi_0.13          ggrepel_0.9.1            bit64_4.0.5              AnnotationDbi_1.56.1     fansi_0.5.0              mvtnorm_1.1-3           
 [13] apeglm_1.16.0            splines_4.1.2            cachem_1.0.6             geneplotter_1.72.0       knitr_1.36               jsonlite_1.7.2          
 [19] Rsamtools_2.10.0         annotate_1.72.0          ashr_2.2-47              png_0.1-7                GreyListChIP_1.26.0      compiler_4.1.2          
 [25] httr_1.4.2               assertthat_0.2.1         Matrix_1.3-4             fastmap_1.1.0            limma_3.50.0             cli_3.1.0               
 [31] htmltools_0.5.2          tools_4.1.2              coda_0.19-4              gtable_0.3.0             glue_1.4.2               GenomeInfoDbData_1.2.7  
 [37] systemPipeR_2.0.2        dplyr_1.0.7              ShortRead_1.52.0         Rcpp_1.0.7               bbmle_1.0.24             jquerylib_0.1.4         
 [43] vctrs_0.3.8              Biostrings_2.62.0        rtracklayer_1.54.0       xfun_0.27                stringr_1.4.0            lifecycle_1.0.1         
 [49] irlba_2.3.3              restfulr_0.0.13          gtools_3.9.2             XML_3.99-0.8             zlibbioc_1.40.0          MASS_7.3-54             
 [55] scales_1.1.1             BSgenome_1.62.0          RColorBrewer_1.1-2       yaml_2.2.1               memoise_2.0.0            ggplot2_3.3.5           
 [61] emdbook_1.3.12           bdsmatrix_1.3-4          latticeExtra_0.6-29      stringi_1.7.5            RSQLite_2.2.8            SQUAREM_2021.1          
 [67] genefilter_1.76.0        BiocIO_1.4.0             caTools_1.18.2           BiocParallel_1.28.0      truncnorm_1.0-8          rlang_0.4.12            
 [73] pkgconfig_2.0.3          bitops_1.0-7             evaluate_0.14            lattice_0.20-45          invgamma_1.1             purrr_0.3.4             
 [79] GenomicAlignments_1.30.0 htmlwidgets_1.5.4        bit_4.0.4                tidyselect_1.1.1         plyr_1.8.6               magrittr_2.0.1          
 [85] DESeq2_1.34.0            R6_2.5.1                 gplots_3.1.1             generics_0.1.1           DelayedArray_0.20.0      DBI_1.1.1               
 [91] pillar_1.6.4             survival_3.2-13          KEGGREST_1.34.0          RCurl_1.98-1.5           mixsqp_0.3-43            tibble_3.1.5            
 [97] crayon_1.4.2             KernSmooth_2.23-20       utf8_1.2.2               rmarkdown_2.11           jpeg_0.1-9               locfit_1.5-9.4          
[103] grid_4.1.2               blob_1.2.2               digest_0.6.28            xtable_1.8-4             numDeriv_2016.8-1.1      munsell_0.5.0
DiffBind • 1.9k views
ADD COMMENT
0
Entering edit mode
Rory Stark ★ 5.2k
@rory-stark-5741
Last seen 5 weeks ago
Cambridge, UK

If you can send me a way to access your DBA objects, I can have a look to what what is happening. I'd need to see the DBA object before the call to dba.count()as well as the final version after the call to dba.analyze().

ADD COMMENT
0
Entering edit mode

Can I send them to an email via Zendto, or via a shared Dropbox folder? Do you need an RDS or Rdata file? Thank you.

ADD REPLY
0
Entering edit mode

Yes, send an email with links to where I can download the saved objects to the maintainer email listed on the package page.

Optimally, you will save each object in a separate file using dba.save().

ADD REPLY
0
Entering edit mode

I jumped the gun and added RDS files to the Dropbox folder. I'll use dba.save().

ADD REPLY
0
Entering edit mode

I've got the data and have had a look. Can you give me examples of merged peaks where the Called1 and Called2 are not reporting correct values in the samples_qval_C1 report?

ADD REPLY
0
Entering edit mode

Hi Rory:

Hope you are well.

Can you help me out with this problem that I am facing with Diffbind?

Not able to replicated Diffbind example dataset.

ADD REPLY
0
Entering edit mode

Thank you for looking. I have put pictures of the following regions in UCSC browser in the Dropbox folder that are focused on the regions below (zoomed out 1.5x to give context). The number next to the 200bp summit region is the qvalue from MACS2. The pictures show the original MACS2 peaks and the 200bp regions centred upon the MACS2 summit that are used as the input to DiffBind.

Ex1 chr12 6084297 6085025 - Called1 = 0, Called2 = 0, but each has an input peak.

Ex2 chr12 47166002 47166241 - Called1 = 2, Called2 = 3, but NP2 and NP19 had no summit region.

Ex3 chr4 42360321 42360584 - Called1 = 1, Called2 = 0, but NP1, 2 and 13 had input peaks

I had given you the link to the full UCSC session in a file, in the Dropbox folder, called 'UCSC_link_17Jan22.txt'.

Thank you, Ian

ADD REPLY
1
Entering edit mode

Thank you for helping me track own this bug. I have completed a fix that will appear in the next few days as DiffBind_3.4.5.

ADD REPLY
0
Entering edit mode

Glad to help. Thank you for your effort!

Does this mean I am using DiffBind in an unusual way? Are you able to explain what the problem was? I am wondering whether it is just the Called1/Called2 numbers that are wrongly reported or whether it also affects the other statistics? I.e. should I rerun previous analyses?

ADD REPLY
0
Entering edit mode

Nothing too unusual, just a bug where in certain conditions the Called columns were being computed using the unmerged datasets.

This only affects the Called columns in a report returned by dba.report(). I did extensive testing to make sure it did not impact other operations such as forming consensus peaksets, computing overlaps, and generating Venn diagrams. No calculations or reporting of any statistics are impacted.

Unfortunately, you should re-run any previous analyses if you were relying on the Called1 and Called2 columns. Some of these may be incorrect depending on the exact sequence of commands you used to generate the analysis object.

If you re-load an older DBA object with this issue, it will repaired if possible. If not, you'll see a warning if you try to generate a report with any of the Called columns, and the report will be generated without those columns present, from version 3.4.6. (With the objects you sent me, the "pre" one can be repaired but the "post" one can not.)

ADD REPLY

Login before adding your answer.

Traffic: 860 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6