Question: bplapply with ShortRead functions
gravatar for Vivek.b
11 days ago by
Vivek.b40 wrote:

Hey everyone

I am facing an issue with using ShortRead functions with bplapply. I am trying to demultiplex a fastq file. Following is an example function, which does nothing but creates multiple paired-end fastq files, all containing same reads :

demultiplex_fastq <- function(fastq_R1, fastq_R2, destinations, outdir, ncores = 1) {

    param = BiocParallel::MulticoreParam(workers = ncores)
    message("de-multiplexing the FASTQ file")
    ## filter and write
    info <- BiocParallel::bplapply(seq_along(destinations), function(i){
        split1 <- file.path(outdir, paste0(destinations[i],"_R1.fastq.gz"))
        split2 <- file.path(outdir, paste0(destinations[i],"_R2.fastq.gz"))
        ## open input stream
        stream_R1 <- ShortRead::FastqStreamer(fastq_R1)
        stream_R2 <- ShortRead::FastqStreamer(fastq_R2)
        on.exit(close(stream_R2), add = TRUE)
        repeat {
            fq_R1 <- ShortRead::yield(stream_R1)
            fq_R2 <- ShortRead::yield(stream_R2)
            if (length(fq_R1) == 0) {
            id2keep <- 1:10
            ShortRead::writeFastq(fq_R1[id2keep], split1, "a")
            ShortRead::writeFastq(fq_R2[id2keep], split2, "a")
    }, BPPARAM = param)


This does work when I use single core (writes the files), but gets stuck when I use >1 cores. Can anyone point out what's the issue here.

Thanks in advance

My sessionInfo()

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/
LAPACK: /usr/lib/lapack/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1    yaml_2.1.14
ADD COMMENTlink modified 11 days ago by Martin Morgan ♦♦ 21k • written 11 days ago by Vivek.b40
gravatar for Martin Morgan
11 days ago by
Martin Morgan ♦♦ 21k
United States
Martin Morgan ♦♦ 21k wrote:

Does it help to use SnowParam() instead of MulticoreParam() ? Otherwise, I think the processes may be accessing shared global variables in an unhealthy way.

ADD COMMENTlink written 11 days ago by Martin Morgan ♦♦ 21k

Thanks Martin. I think the issue was the second one you mentioned. Today I killed some processes on my computer and freed a few GBs of memory, and the function is now working. So it needed more memory when executed multi-core than on single core. Weird coincidence for me, yesterday I tried on our Rstudio server, local server and on my computer and none of them had enough free memory yesterday afternoon to make it run..

ADD REPLYlink written 10 days ago by Vivek.b40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 148 users visited in the last hour