Hi all:
I am trying to analyze zebrafish ATAC seq data using DiffBind. When I run dba.count I get an error like this.
Error: Error processing one or more read files. Check warnings(). In addition: Warning messages: 1: In mclapply(arglist, fn, ..., mc.preschedule = TRUE, mc.allow.recursive = TRUE) : all scheduled cores encountered errors in user code 2: error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors 3: error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors 4: error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors 5: error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors
It would be great if you could help me out with this. Thanks
> foxc1b5dpf <- read.delim("foxc1b5dpf.csv")
>
> foxc1b5dpf
SampleID Tissue Treatment
1 WT_D5_Rep1 WT_D5 None
2 WT_D5_Rep2 WT_D5 None
3 Foxc1b_D5_Rep1 Foxc1b_D5 None
4 Foxc1b_D5_Rep2 Foxc1b_D5 None
Replicate bamReads
1 1 23841_S4_R1_001.bam
2 2 23846_S9_R1_001.bam
3 1 23842_S5_R1_001.bam
4 2 23847_S10_R1_001.bam
Peaks PeakCaller
1 23841_S4_R1_001.bed bed
2 23846_S9_R1_001.bed bed
3 23842_S5_R1_001.bed bed
4 23847_S10_R1_001.bed bed
>
> foxc1b5dpfanalysis <- dba(sampleSheet = foxc1b5dpf)
WT_D5_Rep1 WT_D5 None 1 bed
WT_D5_Rep2 WT_D5 None 2 bed
Foxc1b_D5_Rep1 Foxc1b_D5 None 1 bed
Foxc1b_D5_Rep2 Foxc1b_D5 None 2 bed
>
> foxc1b5dpfanalysis
4 Samples, 2513433 sites in matrix (3256107 total):
ID Tissue Treatment
1 WT_D5_Rep1 WT_D5 None
2 WT_D5_Rep2 WT_D5 None
3 Foxc1b_D5_Rep1 Foxc1b_D5 None
4 Foxc1b_D5_Rep2 Foxc1b_D5 None
Replicate Intervals
1 1 18987157
2 2 22607429
3 1 38612942
4 2 31976221
>
> plot(foxc1b5dpfanalysis)
>
> foxc1b5dpfcount <- dba.count(foxc1b5dpfanalysis, bUseSummarizeOverlaps = TRUE)
Computing summits...
Re-centering peaks...
Error: Error processing one or more read files. Check warnings().
In addition: Warning messages:
1: In mclapply(arglist, fn, ..., mc.preschedule = TRUE, mc.allow.recursive = TRUE) :
all scheduled cores encountered errors in user code
2: error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors
3: error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors
4: error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors
5: error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors
# include your problematic code here with any corresponding output
# please also include the results of running the following in an R session
sessionInfo( )
sessionInfo() R version 4.1.2 (2021-11-01) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.7 LTS
Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocParallel_1.28.3 DiffBind_3.4.7 SummarizedExperiment_1.24.0
[4] Biobase_2.54.0 MatrixGenerics_1.6.0 matrixStats_0.61.0
[7] GenomicRanges_1.46.1 GenomeInfoDb_1.30.0 IRanges_2.28.0
[10] S4Vectors_0.32.3 BiocGenerics_0.40.0
loaded via a namespace (and not attached):
[1] bitops_1.0-7 RColorBrewer_1.1-2 numDeriv_2016.8-1.1 tools_4.1.2
[5] utf8_1.2.2 R6_2.5.1 irlba_2.3.5 KernSmooth_2.23-20
[9] DBI_1.1.2 colorspace_2.0-2 apeglm_1.16.0 tidyselect_1.1.1
[13] compiler_4.1.2 cli_3.1.1 DelayedArray_0.20.0 rtracklayer_1.54.0
[17] caTools_1.18.2 scales_1.1.1 SQUAREM_2021.1 mvtnorm_1.1-3
[21] mixsqp_0.3-43 stringr_1.4.0 digest_0.6.29 Rsamtools_2.10.0
[25] XVector_0.34.0 jpeg_0.1-9 pkgconfig_2.0.3 htmltools_0.5.2
[29] BSgenome_1.62.0 fastmap_1.1.0 invgamma_1.1 bbmle_1.0.24
[33] limma_3.50.0 htmlwidgets_1.5.4 rlang_1.0.0 rstudioapi_0.13
[37] BiocIO_1.4.0 generics_0.1.1 hwriter_1.3.2 gtools_3.9.2
[41] dplyr_1.0.7 RCurl_1.98-1.5 magrittr_2.0.2 GenomeInfoDbData_1.2.7
[45] Matrix_1.4-0 Rcpp_1.0.8 munsell_0.5.0 fansi_1.0.2
[49] lifecycle_1.0.1 stringi_1.7.6 yaml_2.2.2 MASS_7.3-55
[53] zlibbioc_1.40.0 gplots_3.1.1 plyr_1.8.6 grid_4.1.2
[57] parallel_4.1.2 ggrepel_0.9.1 bdsmatrix_1.3-4 crayon_1.4.2
[61] lattice_0.20-45 Biostrings_2.62.0 locfit_1.5-9.4 pillar_1.6.5
[65] rjson_0.2.21 systemPipeR_2.0.5 XML_3.99-0.8 glue_1.6.1
[69] ShortRead_1.52.0 GreyListChIP_1.26.0 latticeExtra_0.6-29 BiocManager_1.30.16
[73] png_0.1-7 vctrs_0.3.8 gtable_0.3.0 purrr_0.3.4
[77] amap_0.8-18 assertthat_0.2.1 ashr_2.2-47 ggplot2_3.3.5
[81] emdbook_1.3.12 xfun_0.29 restfulr_0.0.13 coda_0.19-4
[85] truncnorm_1.0-8 tibble_3.1.6 GenomicAlignments_1.30.0 tinytex_0.36
[89] ellipsis_0.3.2
Hi Rory:
Thank you for the reply. When I run the dba.count in serial mode it gives me this error.
'''
Error in (function (cond) : error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors 1 remote errors, element index: 1 0 unevaluated and other errors first remote error: each range must have a width that is < 2^31 '''
Hi Rory:
I was wondering if the 2^31 ERROR is because of the the high number of sites in matrix and intervals in my dataset. It would be great if you could provide your insights to this error. Thanks
foxc1b5dpfanalysis
4 Samples, 2513433 sites in matrix (3256107 total):
1 WT_D5_Rep1 WT_D5 None 2 WT_D5_Rep2 WT_D5 None 3 Foxc1b_D5_Rep1 Foxc1b_D5 None 4 Foxc1b_D5_Rep2 Foxc1b_D5 None
Replicate Intervals 1 1 18987157 2 2 22607429 3 1 38612942 4 2 31976221 '''
The main issue with having >1M consensus peaks is memory, but I'd expect to see the error message be more clear in this regard (plus you only have four samples). You could try it with a more restrictive consensus set, eg:
It would be also be informative to try this again without computing summits:
If neither of these yields a solution, I'd need to get access to your data to track it down.
Hi Rory:
Thank you for the reply.
The dba.count showed the error again when I tried with minOverlap=3.
However it worked when I tried
foxc1b5dpfcount <- dba.count(foxc1b5dpfanalysis, summits=FALSE, bParallel=FALSE)
Does keeping the summits=FALSE, make a difference in the Differential analysis?
It all depends on the nature of the data you are analyzing.
is this a ChIP? Or maybe a Cut&Run? Are you expecting to have 2-3M separate binding sites? That is fairly unusual for a binding protein, but I don't know what you are looking at.
In this situation, I'd consider a few things:
foxc1b5dpfanalysis
object prior to counting:peaks <- dba.peakset(foxc1b5dpfanalysis, bRetrieve=TRUE)
and looking at the widths, e.g.summary(width(peaks))
. You can also look at the sum of the widths (sum(width(peaks))
to see how much of the genome is being marked as enriched.Basically what I suspect is happening is that, the default
summits
value of 200 is resulting in each of your 2.5M tiny peaks being turned into overlapping 401bp peaks, which are then being merged into one giant peak covering the entirely of each chromosome. Which is clearly not what you want!In this case, you should probably not use summits (set
summits=FALSE
), or perhaps use a very small setting, e.g.summits=10
.Note that you can stop using
bParallel=FALSE
now that we have narrowed the problem down to merging after computing summits.