Question: Problem with summarizeOverlaps() when reading >1 BAM file: "stop worker failed"
gravatar for ErickF
2.4 years ago by
ErickF20 wrote:


I recently started working with RNAseq data. I used the code below to try to read 2-4 BAM files (BAM and BAI in the same directory, etc) but I repeatedly get the following error when running summarizeOverlaps():

Error: stop worker failed:
  'clear_cluster' receive data failed:
  reached elapsed time limit

One other time I got this error (with the same code):

Error: 'bplapply' receive data failed:
  error reading from connection

The BAM files are from ~40M single-end 75bp reads, each ~2-2.5Gb (aligned using tophat2/bowtie2; hg19 reference genome). Code, sessionInfo(), and last lines from traceback() are below (of note, this works just fine if I try to do just one BAM file):

> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> grl <- exonsBy(txdb, by="gene")
> bamLst
  BamFileList of length 4
  names(4): file1.bam file2.bam file3.bam file4.bam
> experiment2 <- summarizeOverlaps(features=grl, reads=bamLst, ignore.strand=T, singleEnd=T)
  Error: stop worker failed:
    'clear_cluster' receive data failed:
    reached elapsed time limit

> traceback()  
16: stop(.error_worker_comm(e, "stop worker failed"))  
15: value[[3L]](cond)  
14: tryCatchOne(expr, names, parentenv, handlers[[1L]])  
13: tryCatchList(expr, classes, parentenv, handlers)  

> sessionInfo()  
R version 3.3.1 (2016-06-21)  
Platform: x86_64-apple-darwin13.4.0 (64-bit)  
Running under: OS X 10.11.4 (El Capitan)  
attached base packages:  
[1] stats4  parallel  stats  graphics  grDevices utils  datasets  methods   base  
other attached packages:  
 [1] GenomicAlignments_1.8.3 Rsamtools_1.24.0           Biostrings_2.40.2  
 [4] XVector_0.12.0          SummarizedExperiment_1.2.3 Biobase_2.32.0  
 [7] GenomicRanges_1.24.2    GenomeInfoDb_1.8.1         IRanges_2.6.1  
[10] S4Vectors_0.10.1        BiocGenerics_0.18.0   


It seems to me like this may be related to either computer memory (8Gb), cores (4), or something like that. Beyond using a more powerful computer, is there any way to fix (or circumvent) this??

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by ErickF20
gravatar for ErickF
2.4 years ago by
ErickF20 wrote:

Update: Seems like indeed this was related to computing power (memory, cores, or something). I tried with smaller files and it worked. So I added a yieldSize parameter to BamFileList when creating "bamLst", to limit the number of reads scanned from the file at one time:

bamLst <- BamFileList(files1, yieldSize=7500000)

Seems like problem fixed, although I wonder if it makes things run slower too. If anyone has any other suggestions, let me know!!

ADD COMMENTlink written 2.4 years ago by ErickF20

I would expect yieldSize of > 100000 to be ok for speed. You could process in serial with


or perhaps see the Rsubread::featureCounts() or bamsignals packages.

ADD REPLYlink written 2.4 years ago by Martin Morgan ♦♦ 22k

Will definitely try those --thanks!

ADD REPLYlink written 2.4 years ago by ErickF20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 401 users visited in the last hour