Question: bplapply with ShortRead functions
gravatar for Vivek.b
19 months ago by
Vivek.b100 wrote:

Hey everyone

I am facing an issue with using ShortRead functions with bplapply. I am trying to demultiplex a fastq file. Following is an example function, which does nothing but creates multiple paired-end fastq files, all containing same reads :

demultiplex_fastq <- function(fastq_R1, fastq_R2, destinations, outdir, ncores = 1) {

    param = BiocParallel::MulticoreParam(workers = ncores)
    message("de-multiplexing the FASTQ file")
    ## filter and write
    info <- BiocParallel::bplapply(seq_along(destinations), function(i){
        split1 <- file.path(outdir, paste0(destinations[i],"_R1.fastq.gz"))
        split2 <- file.path(outdir, paste0(destinations[i],"_R2.fastq.gz"))
        ## open input stream
        stream_R1 <- ShortRead::FastqStreamer(fastq_R1)
        stream_R2 <- ShortRead::FastqStreamer(fastq_R2)
        on.exit(close(stream_R2), add = TRUE)
        repeat {
            fq_R1 <- ShortRead::yield(stream_R1)
            fq_R2 <- ShortRead::yield(stream_R2)
            if (length(fq_R1) == 0) {
            id2keep <- 1:10
            ShortRead::writeFastq(fq_R1[id2keep], split1, "a")
            ShortRead::writeFastq(fq_R2[id2keep], split2, "a")
    }, BPPARAM = param)


This does work when I use single core (writes the files), but gets stuck when I use >1 cores. Can anyone point out what's the issue here.

Thanks in advance

My sessionInfo()

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/
LAPACK: /usr/lib/lapack/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1    yaml_2.1.14
shortread biocparallel • 383 views
ADD COMMENTlink modified 19 months ago by Martin Morgan ♦♦ 23k • written 19 months ago by Vivek.b100
Answer: bplapply with ShortRead functions
gravatar for Martin Morgan
19 months ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

Does it help to use SnowParam() instead of MulticoreParam() ? Otherwise, I think the processes may be accessing shared global variables in an unhealthy way.

ADD COMMENTlink written 19 months ago by Martin Morgan ♦♦ 23k

Thanks Martin. I think the issue was the second one you mentioned. Today I killed some processes on my computer and freed a few GBs of memory, and the function is now working. So it needed more memory when executed multi-core than on single core. Weird coincidence for me, yesterday I tried on our Rstudio server, local server and on my computer and none of them had enough free memory yesterday afternoon to make it run..

ADD REPLYlink written 19 months ago by Vivek.b100
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 111 users visited in the last hour