Question

dba.count error

1

Entering edit mode

Ayyappa ▴ 10

@768f3a7c

Last seen 3.8 years ago

United States

Hi all:

I am trying to analyze zebrafish ATAC seq data using DiffBind. When I run dba.count I get an error like this.

Error: Error processing one or more read files. Check warnings(). In addition: Warning messages: 1: In mclapply(arglist, fn, ..., mc.preschedule = TRUE, mc.allow.recursive = TRUE) : all scheduled cores encountered errors in user code 2: error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors 3: error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors 4: error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors 5: error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors

It would be great if you could help me out with this. Thanks

> foxc1b5dpf <- read.delim("foxc1b5dpf.csv")
> 
> foxc1b5dpf
        SampleID    Tissue Treatment
1     WT_D5_Rep1     WT_D5      None
2     WT_D5_Rep2     WT_D5      None
3 Foxc1b_D5_Rep1 Foxc1b_D5      None
4 Foxc1b_D5_Rep2 Foxc1b_D5      None
  Replicate             bamReads
1         1  23841_S4_R1_001.bam
2         2  23846_S9_R1_001.bam
3         1  23842_S5_R1_001.bam
4         2 23847_S10_R1_001.bam
                 Peaks PeakCaller
1  23841_S4_R1_001.bed        bed
2  23846_S9_R1_001.bed        bed
3  23842_S5_R1_001.bed        bed
4 23847_S10_R1_001.bed        bed
> 
> foxc1b5dpfanalysis <- dba(sampleSheet = foxc1b5dpf)
WT_D5_Rep1 WT_D5   None 1 bed
WT_D5_Rep2 WT_D5   None 2 bed
Foxc1b_D5_Rep1 Foxc1b_D5   None 1 bed
Foxc1b_D5_Rep2 Foxc1b_D5   None 2 bed
> 
> foxc1b5dpfanalysis
4 Samples, 2513433 sites in matrix (3256107 total):
              ID    Tissue Treatment
1     WT_D5_Rep1     WT_D5      None
2     WT_D5_Rep2     WT_D5      None
3 Foxc1b_D5_Rep1 Foxc1b_D5      None
4 Foxc1b_D5_Rep2 Foxc1b_D5      None
  Replicate Intervals
1         1  18987157
2         2  22607429
3         1  38612942
4         2  31976221
> 
> plot(foxc1b5dpfanalysis)
> 
> foxc1b5dpfcount <- dba.count(foxc1b5dpfanalysis, bUseSummarizeOverlaps = TRUE)
Computing summits...
Re-centering peaks...
Error: Error processing one or more read files. Check warnings().
In addition: Warning messages:
1: In mclapply(arglist, fn, ..., mc.preschedule = TRUE, mc.allow.recursive = TRUE) :
  all scheduled cores encountered errors in user code
2:   error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors 
3:   error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors 
4:   error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors 
5:   error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors 

# include your problematic code here with any corresponding output 
# please also include the results of running the following in an R session 

sessionInfo( )

sessionInfo() R version 4.1.2 (2021-11-01) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.7 LTS

Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] BiocParallel_1.28.3 DiffBind_3.4.7 SummarizedExperiment_1.24.0 [4] Biobase_2.54.0 MatrixGenerics_1.6.0 matrixStats_0.61.0
[7] GenomicRanges_1.46.1 GenomeInfoDb_1.30.0 IRanges_2.28.0
[10] S4Vectors_0.32.3 BiocGenerics_0.40.0

loaded via [1] bitops_1.0-7 [5] utf8_1.2.2 [9] DBI_1.1.2 [13] compiler_4.1.2 [17] caTools_1.18.2 [21] mixsqp_0.3-43 [25] XVector_0.34.0 [29] BSgenome_1.62.0 [33] limma_3.50.0 [37] BiocIO_1.4.0 [41] dplyr_1.0.7 [45] Matrix_1.4-0 [49] lifecycle_1.0.1 [53] zlibbioc_1.40.0 [57] parallel_4.1.2 [61] lattice_0.20-45 [65] rjson_0.2.21 [69] ShortRead_1.52.0 [73] png_0.1-7 [77] amap_0.8-18 [81] emdbook_1.3.12 [85] truncnorm_1.0-8 [89] ellipsis_0.3.2 a namespace (and not attached): RColorBrewer_1.1-2 numDeriv_2016.8-1.1 tools_4.1.2
R6_2.5.1 irlba_2.3.5 KernSmooth_2.23-20
colorspace_2.0-2 apeglm_1.16.0 tidyselect_1.1.1
cli_3.1.1 DelayedArray_0.20.0 rtracklayer_1.54.0
scales_1.1.1 SQUAREM_2021.1 mvtnorm_1.1-3
stringr_1.4.0 digest_0.6.29 Rsamtools_2.10.0
jpeg_0.1-9 pkgconfig_2.0.3 htmltools_0.5.2
fastmap_1.1.0 invgamma_1.1 bbmle_1.0.24
htmlwidgets_1.5.4 rlang_1.0.0 rstudioapi_0.13
generics_0.1.1 hwriter_1.3.2 gtools_3.9.2
RCurl_1.98-1.5 magrittr_2.0.2 GenomeInfoDbData_1.2.7
Rcpp_1.0.8 munsell_0.5.0 fansi_1.0.2
stringi_1.7.6 yaml_2.2.2 MASS_7.3-55
gplots_3.1.1 plyr_1.8.6 grid_4.1.2
ggrepel_0.9.1 bdsmatrix_1.3-4 crayon_1.4.2
Biostrings_2.62.0 locfit_1.5-9.4 pillar_1.6.5
systemPipeR_2.0.5 XML_3.99-0.8 glue_1.6.1
GreyListChIP_1.26.0 latticeExtra_0.6-29 BiocManager_1.30.16
vctrs_0.3.8 gtable_0.3.0 purrr_0.3.4
assertthat_0.2.1 ashr_2.2-47 ggplot2_3.3.5
xfun_0.29 restfulr_0.0.13 coda_0.19-4
tibble_3.1.6 GenomicAlignments_1.30.0 tinytex_0.36

DiffBind • 3.8k views

ADD COMMENT • link updated 3.9 years ago by Rory Stark ★ 5.2k • written 3.9 years ago by Ayyappa ▴ 10

score 0 · Answer 1 · 2022-01-27

0

Entering edit mode

Rory Stark ★ 5.2k

@rory-stark-5741

Last seen 11 months ago

Cambridge, UK

Please run this again in serial mode:

foxc1b5dpfcount <- dba.count(foxc1b5dpfanalysis, bParallel=FALSE)

This will result in each .bam file being processed in turn with any associated warnings and errors reported. The specific warning or error usually gives a good idea of the issues with the .bam file.

ADD COMMENT • link 3.9 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Hi Rory:

Thank you for the reply. When I run the dba.count in serial mode it gives me this error.

'''

foxc1b5dpfcount <- dba.count(foxc1b5dpfanalysis, bParallel=FALSE) Computing summits... Sample: 23841_S4_R1_001.bam125 Sample: 23846_S9_R1_001.bam125 Sample: 23842_S5_R1_001.bam125 Sample: 23847_S10_R1_001.bam125 Re-centering peaks... Sample: 23841_S4_R1_001.bam125 Reads will be counted as Paired-end.

Error in (function (cond) : error in evaluating the argument 'x' in selecting a method for function 'assay': BiocParallel errors 1 remote errors, element index: 1 0 unevaluated and other errors first remote error: each range must have a width that is < 2^31 '''

ADD REPLY • link 3.9 years ago Ayyappa ▴ 10

0

Entering edit mode

Hi Rory:

I was wondering if the 2^31 ERROR is because of the the high number of sites in matrix and intervals in my dataset. It would be great if you could provide your insights to this error. Thanks

foxc1b5dpfanalysis

4 Samples, 2513433 sites in matrix (3256107 total):

          ID    Tissue Treatment

1 WT_D5_Rep1 WT_D5 None 2 WT_D5_Rep2 WT_D5 None 3 Foxc1b_D5_Rep1 Foxc1b_D5 None 4 Foxc1b_D5_Rep2 Foxc1b_D5 None

Replicate Intervals 1 1 18987157 2 2 22607429 3 1 38612942 4 2 31976221 '''

ADD REPLY • link 3.9 years ago Ayyappa ▴ 10

0

Entering edit mode

The main issue with having >1M consensus peaks is memory, but I'd expect to see the error message be more clear in this regard (plus you only have four samples). You could try it with a more restrictive consensus set, eg:

foxc1b5dpfcount <- dba.count(foxc1b5dpfanalysis, minOverlap=3, bParallel=FALSE)

It would be also be informative to try this again without computing summits:

foxc1b5dpfcount <- dba.count(foxc1b5dpfanalysis, summits=FALSE, bParallel=FALSE)

If neither of these yields a solution, I'd need to get access to your data to track it down.

ADD REPLY • link 3.9 years ago Rory Stark ★ 5.2k

0

Entering edit mode

Hi Rory:

Thank you for the reply.

The dba.count showed the error again when I tried with minOverlap=3.

However it worked when I tried

foxc1b5dpfcount <- dba.count(foxc1b5dpfanalysis, summits=FALSE, bParallel=FALSE)

Does keeping the summits=FALSE, make a difference in the Differential analysis?

ADD REPLY • link 3.9 years ago Ayyappa ▴ 10

0

Entering edit mode

It all depends on the nature of the data you are analyzing.

is this a ChIP? Or maybe a Cut&Run? Are you expecting to have 2-3M separate binding sites? That is fairly unusual for a binding protein, but I don't know what you are looking at.

In this situation, I'd consider a few things:

Look at the data in a browser to see if the peak calls are reflecting the enrichment patterns. Are there really a huge number of focused binding sites?
What is the distribution of the width of the peaks? You can look at this by retrieving the peaks from the foxc1b5dpfanalysis object prior to counting: peaks <- dba.peakset(foxc1b5dpfanalysis, bRetrieve=TRUE) and looking at the widths, e.g. summary(width(peaks)). You can also look at the sum of the widths (sum(width(peaks)) to see how much of the genome is being marked as enriched.

Basically what I suspect is happening is that, the default summits value of 200 is resulting in each of your 2.5M tiny peaks being turned into overlapping 401bp peaks, which are then being merged into one giant peak covering the entirely of each chromosome. Which is clearly not what you want!

In this case, you should probably not use summits (set summits=FALSE), or perhaps use a very small setting, e.g. summits=10.

Note that you can stop using bParallel=FALSE now that we have narrowed the problem down to merging after computing summits.

ADD REPLY • link 3.9 years ago Rory Stark ★ 5.2k