Problem with summarizeOverlaps() when reading >1 BAM file: "stop worker failed"
1
2
Entering edit mode
ErickF ▴ 40
@erickf-11032
Last seen 8.4 years ago

Hi,

I recently started working with RNAseq data. I used the code below to try to read 2-4 BAM files (BAM and BAI in the same directory, etc) but I repeatedly get the following error when running summarizeOverlaps():

Error: stop worker failed:
  'clear_cluster' receive data failed:
  reached elapsed time limit

One other time I got this error (with the same code):

Error: 'bplapply' receive data failed:
  error reading from connection

The BAM files are from ~40M single-end 75bp reads, each ~2-2.5Gb (aligned using tophat2/bowtie2; hg19 reference genome). Code, sessionInfo(), and last lines from traceback() are below (of note, this works just fine if I try to do just one BAM file):

> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> grl <- exonsBy(txdb, by="gene")
> bamLst
  BamFileList of length 4
  names(4): file1.bam file2.bam file3.bam file4.bam
> experiment2 <- summarizeOverlaps(features=grl, reads=bamLst, ignore.strand=T, singleEnd=T)
  Error: stop worker failed:
    'clear_cluster' receive data failed:
    reached elapsed time limit

> traceback()  
16: stop(.error_worker_comm(e, "stop worker failed"))  
15: value[[3L]](cond)  
14: tryCatchOne(expr, names, parentenv, handlers[[1L]])  
13: tryCatchList(expr, classes, parentenv, handlers)  

> sessionInfo()  
R version 3.3.1 (2016-06-21)  
Platform: x86_64-apple-darwin13.4.0 (64-bit)  
Running under: OS X 10.11.4 (El Capitan)  
attached base packages:  
[1] stats4  parallel  stats  graphics  grDevices utils  datasets  methods   base  
other attached packages:  
 [1] GenomicAlignments_1.8.3 Rsamtools_1.24.0           Biostrings_2.40.2  
 [4] XVector_0.12.0          SummarizedExperiment_1.2.3 Biobase_2.32.0  
 [7] GenomicRanges_1.24.2    GenomeInfoDb_1.8.1         IRanges_2.6.1  
[10] S4Vectors_0.10.1        BiocGenerics_0.18.0   

 

It seems to me like this may be related to either computer memory (8Gb), cores (4), or something like that. Beyond using a more powerful computer, is there any way to fix (or circumvent) this??

summarizeoverlaps rnaseq bplapply rangedsummarizedexperiment read counting • 3.1k views
ADD COMMENT
2
Entering edit mode
ErickF ▴ 40
@erickf-11032
Last seen 8.4 years ago

Update: Seems like indeed this was related to computing power (memory, cores, or something). I tried with smaller files and it worked. So I added a yieldSize parameter to BamFileList when creating "bamLst", to limit the number of reads scanned from the file at one time:

bamLst <- BamFileList(files1, yieldSize=7500000)

Seems like problem fixed, although I wonder if it makes things run slower too. If anyone has any other suggestions, let me know!!

ADD COMMENT
0
Entering edit mode

I would expect yieldSize of > 100000 to be ok for speed. You could process in serial with

BiocParallel::register(BiocParallel::SerialParam())

or perhaps see the Rsubread::featureCounts() or bamsignals packages.

ADD REPLY
0
Entering edit mode

Will definitely try those --thanks!

ADD REPLY

Login before adding your answer.

Traffic: 648 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6