Entering edit mode
Hi there,
I got the error message below when using dba.count()
with consensus peaks. Can you help to solve the problem? Thanks!
Re-centering peaks...
Error in heights * sapply(called, function(x) x) : non-conformable arrays
Here is more information. My code:
data(tamoxifen_peaks)
tamoxifen_consensus <-
dba.peakset(tamoxifen,
consensus = c(DBA_TISSUE, DBA_CONDITION),
minOverlap = 0.66)
tamoxifen_consensus <- dba(tamoxifen_consensus,
mask = tamoxifen_consensus$masks$Consensus,
minOverlap = 1)
consensus_peaks <- dba.peakset(tamoxifen_consensus, bRetrieve = TRUE)
setwd(vignette_data_dir)
tamoxifen <- dba.count(tamoxifen, peaks=consensus_peaks)
Here is the output and error message:
Computing summits...
Sample: reads/Chr18_BT474_ER_1.bam125
Sample: reads/Chr18_BT474_ER_2.bam125
Sample: reads/Chr18_MCF7_ER_1.bam125
Sample: reads/Chr18_MCF7_ER_2.bam125
Sample: reads/Chr18_MCF7_ER_3.bam125
Sample: reads/Chr18_T47D_ER_1.bam125
Sample: reads/Chr18_T47D_ER_2.bam125
Sample: reads/Chr18_TAMR_ER_1.bam125
Sample: reads/Chr18_TAMR_ER_2.bam125
Sample: reads/Chr18_ZR75_ER_1.bam125
Sample: reads/Chr18_ZR75_ER_2.bam125
Sample: reads/Chr18_BT474_input.bam125
Sample: reads/Chr18_MCF7_input.bam125
Sample: reads/Chr18_T47D_input.bam125
Sample: reads/Chr18_TAMR_input.bam125
Sample: reads/Chr18_ZR75_input.bam125
Re-centering peaks...
Error in heights * sapply(called, function(x) x) : non-conformable arrays
traceback()
returns:
3: pv.Recenter(pv, summits, (numpeaks - numAdded + 1):numpeaks,
pv$called)
2: pv.counts(DBA, peaks = peaks, minOverlap = minOverlap, defaultScore = score,
bLog = bLog, insertLength = fragmentSize, bOnlyCounts = TRUE,
bCalledMasks = TRUE, minMaxval = filter, bParallel = bParallel,
bUseLast = bUseLast, bWithoutDupes = bRemoveDuplicates, bScaleControl = bScaleControl,
filterFun = filterFun, bLowMem = bUseSummarizeOverlaps, readFormat = readFormat,
summits = summits, minMappingQuality = mapQCth, minCount = minCount,
bSubControl = bSubControl, maxGap = maxGap)
1: dba.count(tamoxifen, peaks = consensus_peaks)
Session information:
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Big Sur 10.16
Matrix products: default
LAPACK: /usr/local/anaconda3/envs/bioc3.14-py39/lib/libopenblasp-r0.3.18.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] forcats_0.5.1 stringr_1.4.0
[3] dplyr_1.0.7 purrr_0.3.4
[5] readr_2.1.1 tidyr_1.1.4
[7] tibble_3.1.6 ggplot2_3.3.5
[9] tidyverse_1.3.1 DiffBind_3.4.7
[11] csaw_1.28.0 SummarizedExperiment_1.24.0
[13] Biobase_2.54.0 MatrixGenerics_1.6.0
[15] matrixStats_0.61.0 GenomicRanges_1.46.1
[17] GenomeInfoDb_1.30.0 IRanges_2.28.0
[19] S4Vectors_0.32.3 BiocGenerics_0.40.0
[21] shiny_1.7.1
loaded via a namespace (and not attached):
[1] readxl_1.3.1 backports_1.4.1
[3] servr_0.24 plyr_1.8.6
[5] BiocParallel_1.28.3 usethis_2.1.5
[7] amap_0.8-18 digest_0.6.29
[9] invgamma_1.1 htmltools_0.5.2
[11] SQUAREM_2021.1 fansi_1.0.2
[13] magrittr_2.0.2 memoise_2.0.1
[15] BSgenome_1.62.0 tzdb_0.2.0
[17] limma_3.50.0 remotes_2.4.2
[19] Biostrings_2.62.0 modelr_0.1.8
[21] systemPipeR_2.0.5 bdsmatrix_1.3-4
[23] prettyunits_1.1.1 jpeg_0.1-9
[25] colorspace_2.0-2 rvest_1.0.2
[27] apeglm_1.16.0 ggrepel_0.9.1
[29] haven_2.4.3 xfun_0.29
[31] callr_3.7.0 crayon_1.4.2
[33] RCurl_1.98-1.5 jsonlite_1.7.3
[35] glue_1.6.1 gtable_0.3.0
[37] zlibbioc_1.40.0 XVector_0.34.0
[39] DelayedArray_0.20.0 pkgbuild_1.3.1
[41] scales_1.1.1 mvtnorm_1.1-3
[43] DBI_1.1.2 edgeR_3.36.0
[45] miniUI_0.1.1.1 Rcpp_1.0.8
[47] xtable_1.8-4 emdbook_1.3.12
[49] truncnorm_1.0-8 httr_1.4.2
[51] metapod_1.2.0 htmlwidgets_1.5.4
[53] gplots_3.1.1 RColorBrewer_1.1-2
[55] ellipsis_0.3.2 pkgconfig_2.0.3
[57] XML_3.99-0.8 dbplyr_2.1.1
[59] sass_0.4.0 locfit_1.5-9.4
[61] utf8_1.2.2 tidyselect_1.1.1
[63] rlang_1.0.0 later_1.3.0
[65] cellranger_1.1.0 munsell_0.5.0
[67] tools_4.1.2 cachem_1.0.6
[69] cli_3.1.1 generics_0.1.1
[71] broom_0.7.12 devtools_2.4.3
[73] evaluate_0.14 fastmap_1.1.0
[75] yaml_2.2.2 processx_3.5.2
[77] knitr_1.37 fs_1.5.2
[79] caTools_1.18.2 mime_0.12
[81] xml2_1.3.3 brio_1.1.3
[83] compiler_4.1.2 rstudioapi_0.13
[85] png_0.1-7 testthat_3.1.2
[87] reprex_2.0.1 bslib_0.3.1
[89] stringi_1.7.6 ps_1.6.0
[91] blogdown_1.7 desc_1.4.0
[93] lattice_0.20-45 Matrix_1.4-0
[95] vctrs_0.3.8 pillar_1.6.5
[97] lifecycle_1.0.1 BiocManager_1.30.16
[99] jquerylib_0.1.4 bitops_1.0-7
[101] irlba_2.3.5 httpuv_1.6.5
[103] rtracklayer_1.54.0 BiocIO_1.4.0
[105] R6_2.5.1 latticeExtra_0.6-29
[107] hwriter_1.3.2 bookdown_0.24
[109] promises_1.2.0.1 ShortRead_1.52.0
[111] KernSmooth_2.23-20 sessioninfo_1.2.2
[113] MASS_7.3-55 gtools_3.9.2
[115] assertthat_0.2.1 pkgload_1.2.4
[117] rjson_0.2.21 rprojroot_2.0.2
[119] withr_2.4.3 GenomicAlignments_1.30.0
[121] Rsamtools_2.10.0 GenomeInfoDbData_1.2.7
[123] hms_1.1.1 parallel_4.1.2
[125] grid_4.1.2 coda_0.19-4
[127] rmarkdown_2.11 GreyListChIP_1.26.0
[129] ashr_2.2-47 mixsqp_0.3-43
[131] bbmle_1.0.24 lubridate_1.8.0
[133] numDeriv_2016.8-1.1 restfulr_0.0.13
Thank you, Rory!
Hi Rory,
I am currently encountering the same error. I have copied my code below:
The CSV file whose path is shown in the first line contains path info for all my BAM and BED files (I can provide the info from that file if it would be helpful). After running for a couple hours with no message, I received the following:
I have run this code as recently as 3 or 4 months ago with no issues. Using
traceback()
produced exactly the same message Mark saw/provided above.Below is my session info:
I appreciate any help you can provide!
Nathaniel
Given that it runs for hours, I assume you have some combination of a large number of samples, peaks, and/or reads, is that correct?
The two things I recommend here are a) running this with the currently released version and b) running
dba.count()
withbParallel = FALSE
. This may cause it to take an even longer time to run (although there are performance improvements in the current version over the one you are using).One possibility is that
DiffBind
is taking too many cores for parallel processing (not enough memory to support all the parallel operations). So you could try settingTcf4_CRISPRa_ATAC$config$cores <- 4
(or another number that works) to get some benefits of parallelization without overloading memory.Another possibility: I recently had a report of someone having this problem. Then he told me how he fixed it:
I am in the process of addressing this in an upcoming fix, but in the meantime you may want to check that your
bamReads
are all unique. If you need to use the samebamRead
file for more than one sample, I have some workarounds for that.If these do not solve your problem, let me know and we can figure out how to take a deeper look.
Hi Rory,
My apologies for the delay. I tried all of the measures you suggested, and I am still receiving the same error. Specifically:
bParallel = FALSE
to thedba.count
functionSpecifically for point (4), I ran:
with the
Tcf4_CRISPRa_ATAC
object created as it was in my initial post. It may be worth noting that this did run faster than my initial attempt, even with the lack of parallel processing.I received the following output:
Session info:
One possibility I could envision is that the BAM files were somehow corrupted when I copied them over from our HPC. However, I have used these files for other analyses with no issues, so that may be unlikely, but thought I would mention it anyway. I appreciate your continued help!
I've checked in a fix for this (
DiffBind_3.6.3
).It had to do with specifying
minOverlap=0
in the initial call todba()
. Did you mean for the consensus peakset used indba.count()
to include all the (merged) peaks? If so you would need to specifyminOverlap=0
in the call todba.count()
as well (or instead). Indeed, if you move theminOverlap=2
from thedba()
call to thedba.count()
call, your code will run fine even without the fix.Also, you should have a very specific reason to set
bUseSummarizeOverlaps=FALSE
, especially if you have any paired-end sequencing data.Thanks a lot Rory! I wanted to follow up for a bit more information about the potential issues encountered when setting
bUseSummarizeOverlaps=FALSE
with paired-end sequencing data. In my case (and others in my lab), we are analyzing ATAC-seq data (PE75). As part of our pipeline, we remove all duplicate reads using Picard. We also remove any reads that are not properly paired. Given these steps, would settingbUseSummarizeOverlaps=FALSE
still be potentially problematic? And, more broadly, what are the inherent limitations when usingsummarizeOverlap
with paired-end data? I have found some conflicting information about this in various places online, and depending on the nature of the limitations, my lab and I are curious whether we should re-analyze some of our existing data.Thanks again for your continued help!