Search
Question: Problem with summarizeOverlaps() when reading >1 BAM file: "stop worker failed"
2
gravatar for ErickF
16 months ago by
ErickF20
ErickF20 wrote:

Hi,

I recently started working with RNAseq data. I used the code below to try to read 2-4 BAM files (BAM and BAI in the same directory, etc) but I repeatedly get the following error when running summarizeOverlaps():

Error: stop worker failed:
  'clear_cluster' receive data failed:
  reached elapsed time limit

One other time I got this error (with the same code):

Error: 'bplapply' receive data failed:
  error reading from connection

The BAM files are from ~40M single-end 75bp reads, each ~2-2.5Gb (aligned using tophat2/bowtie2; hg19 reference genome). Code, sessionInfo(), and last lines from traceback() are below (of note, this works just fine if I try to do just one BAM file):

> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> grl <- exonsBy(txdb, by="gene")
> bamLst
  BamFileList of length 4
  names(4): file1.bam file2.bam file3.bam file4.bam
> experiment2 <- summarizeOverlaps(features=grl, reads=bamLst, ignore.strand=T, singleEnd=T)
  Error: stop worker failed:
    'clear_cluster' receive data failed:
    reached elapsed time limit

> traceback()  
16: stop(.error_worker_comm(e, "stop worker failed"))  
15: value[[3L]](cond)  
14: tryCatchOne(expr, names, parentenv, handlers[[1L]])  
13: tryCatchList(expr, classes, parentenv, handlers)  

> sessionInfo()  
R version 3.3.1 (2016-06-21)  
Platform: x86_64-apple-darwin13.4.0 (64-bit)  
Running under: OS X 10.11.4 (El Capitan)  
attached base packages:  
[1] stats4  parallel  stats  graphics  grDevices utils  datasets  methods   base  
other attached packages:  
 [1] GenomicAlignments_1.8.3 Rsamtools_1.24.0           Biostrings_2.40.2  
 [4] XVector_0.12.0          SummarizedExperiment_1.2.3 Biobase_2.32.0  
 [7] GenomicRanges_1.24.2    GenomeInfoDb_1.8.1         IRanges_2.6.1  
[10] S4Vectors_0.10.1        BiocGenerics_0.18.0   

 

It seems to me like this may be related to either computer memory (8Gb), cores (4), or something like that. Beyond using a more powerful computer, is there any way to fix (or circumvent) this??

ADD COMMENTlink modified 16 months ago • written 16 months ago by ErickF20
2
gravatar for ErickF
16 months ago by
ErickF20
ErickF20 wrote:

Update: Seems like indeed this was related to computing power (memory, cores, or something). I tried with smaller files and it worked. So I added a yieldSize parameter to BamFileList when creating "bamLst", to limit the number of reads scanned from the file at one time:

bamLst <- BamFileList(files1, yieldSize=7500000)

Seems like problem fixed, although I wonder if it makes things run slower too. If anyone has any other suggestions, let me know!!

ADD COMMENTlink written 16 months ago by ErickF20

I would expect yieldSize of > 100000 to be ok for speed. You could process in serial with

BiocParallel::register(BiocParallel::SerialParam())

or perhaps see the Rsubread::featureCounts() or bamsignals packages.

ADD REPLYlink written 16 months ago by Martin Morgan ♦♦ 20k

Will definitely try those --thanks!

ADD REPLYlink written 16 months ago by ErickF20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 296 users visited in the last hour