Question: bplapply stalling upon error with large data objects
0
gravatar for Aaron Lun
3.0 years ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

I stumbled across some strange behaviour of bplapply when a large data object is passed to FUN and an error is also raised within FUN:

library(BiocParallel)
bigmat <- matrix(rnorm(1e7), nrow=50)

.recount_cells <- function(x, incoming)
{
    stop("YAY")
    return(x)
}

system.time(bplapply(1:2, FUN=.recount_cells, incoming=bigmat, 
    BPPARAM=SerialParam())) # Finishes instantly
system.time(bplapply(1:2, FUN=.recount_cells, incoming=bigmat, 
    BPPARAM=MulticoreParam(2))) # Killed after several minutes

This doesn't happen if I set bigmat <- 1, nor if I remove the stop call in .recount_cells. In those situations, both of the bplapply calls above execute in a timely manner.

I have encountered related problems with bplapply stalling even when there is no error being raised in FUN, but the above example is the simplest to reproduce on my system (Ubuntu, below). The same behaviour is observed with BiocParallel 1.7.5 and on a Mac OSX.

Anyway, here's my sessionInfo():

R version 3.3.0 Patched (2016-05-03 r70580)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] BiocParallel_1.6.3

loaded via a namespace (and not attached):
[1] parallel_3.3.0
ADD COMMENTlink modified 3.0 years ago by Martin Morgan ♦♦ 23k • written 3.0 years ago by Aaron Lun24k

I don't yet have a solution but the problem can be seen in this non-parallel code

fun <- function(x) stop()
do.call("fun", list(matrix(rnorm(1e7), 50)))

which gets slower as the data get larger.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Martin Morgan ♦♦ 23k
Answer: bplapply stalling upon error with large data objects
3
gravatar for Martin Morgan
3.0 years ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

The problem is that on error R creates a variable .Traceback that contains a text version of the call stack. Each element of the call stack has a text representation of the entire bigmat object. Fixing this requires a change to R (after svn revision r71040) and an updated BiocParallel (version 1.6.4 / 1.7.6).

ADD COMMENTlink written 3.0 years ago by Martin Morgan ♦♦ 23k

Thanks. The MulticoreParam example now finishes up in 30 seconds, which is better. (Still a bit odd that it takes so long to just throw an error, though.) This also fixes the related problem that I referred to in my original post.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Aaron Lun24k

ok, now 1.6.5 / 1.7.7 might be a better solution.

ADD REPLYlink written 3.0 years ago by Martin Morgan ♦♦ 23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 279 users visited in the last hour