Question: bplapply with ShortRead functions
gravatar for Vivek.b
5 months ago by
Vivek.b40 wrote:

Hey everyone

I am facing an issue with using ShortRead functions with bplapply. I am trying to demultiplex a fastq file. Following is an example function, which does nothing but creates multiple paired-end fastq files, all containing same reads :

demultiplex_fastq <- function(fastq_R1, fastq_R2, destinations, outdir, ncores = 1) {

    param = BiocParallel::MulticoreParam(workers = ncores)
    message("de-multiplexing the FASTQ file")
    ## filter and write
    info <- BiocParallel::bplapply(seq_along(destinations), function(i){
        split1 <- file.path(outdir, paste0(destinations[i],"_R1.fastq.gz"))
        split2 <- file.path(outdir, paste0(destinations[i],"_R2.fastq.gz"))
        ## open input stream
        stream_R1 <- ShortRead::FastqStreamer(fastq_R1)
        stream_R2 <- ShortRead::FastqStreamer(fastq_R2)
        on.exit(close(stream_R2), add = TRUE)
        repeat {
            fq_R1 <- ShortRead::yield(stream_R1)
            fq_R2 <- ShortRead::yield(stream_R2)
            if (length(fq_R1) == 0) {
            id2keep <- 1:10
            ShortRead::writeFastq(fq_R1[id2keep], split1, "a")
            ShortRead::writeFastq(fq_R2[id2keep], split2, "a")
    }, BPPARAM = param)


This does work when I use single core (writes the files), but gets stuck when I use >1 cores. Can anyone point out what's the issue here.

Thanks in advance

My sessionInfo()

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/
LAPACK: /usr/lib/lapack/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1    yaml_2.1.14
ADD COMMENTlink modified 5 months ago by Martin Morgan ♦♦ 22k • written 5 months ago by Vivek.b40
gravatar for Martin Morgan
5 months ago by
Martin Morgan ♦♦ 22k
United States
Martin Morgan ♦♦ 22k wrote:

Does it help to use SnowParam() instead of MulticoreParam() ? Otherwise, I think the processes may be accessing shared global variables in an unhealthy way.

ADD COMMENTlink written 5 months ago by Martin Morgan ♦♦ 22k

Thanks Martin. I think the issue was the second one you mentioned. Today I killed some processes on my computer and freed a few GBs of memory, and the function is now working. So it needed more memory when executed multi-core than on single core. Weird coincidence for me, yesterday I tried on our Rstudio server, local server and on my computer and none of them had enough free memory yesterday afternoon to make it run..

ADD REPLYlink written 5 months ago by Vivek.b40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 325 users visited in the last hour