Search
Question: bplapply with ShortRead functions
0
gravatar for Vivek.b
9 months ago by
Vivek.b90
Germany
Vivek.b90 wrote:

Hey everyone

I am facing an issue with using ShortRead functions with bplapply. I am trying to demultiplex a fastq file. Following is an example function, which does nothing but creates multiple paired-end fastq files, all containing same reads :

demultiplex_fastq <- function(fastq_R1, fastq_R2, destinations, outdir, ncores = 1) {

    param = BiocParallel::MulticoreParam(workers = ncores)
    message("de-multiplexing the FASTQ file")
    ## filter and write
    info <- BiocParallel::bplapply(seq_along(destinations), function(i){
        split1 <- file.path(outdir, paste0(destinations[i],"_R1.fastq.gz"))
        split2 <- file.path(outdir, paste0(destinations[i],"_R2.fastq.gz"))
        print(split1)
        print(split2)
        ## open input stream
        stream_R1 <- ShortRead::FastqStreamer(fastq_R1)
        stream_R2 <- ShortRead::FastqStreamer(fastq_R2)
        on.exit(close(stream_R1))
        on.exit(close(stream_R2), add = TRUE)
        repeat {
            fq_R1 <- ShortRead::yield(stream_R1)
            fq_R2 <- ShortRead::yield(stream_R2)
            if (length(fq_R1) == 0) {
                break
            }
            id2keep <- 1:10
            ShortRead::writeFastq(fq_R1[id2keep], split1, "a")
            ShortRead::writeFastq(fq_R2[id2keep], split2, "a")
        }
        return("Done!")
    }, BPPARAM = param)

    return("Done!")
}

This does work when I use single core (writes the files), but gets stuck when I use >1 cores. Can anyone point out what's the issue here.

Thanks in advance

My sessionInfo()

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1    yaml_2.1.14
ADD COMMENTlink modified 9 months ago by Martin Morgan ♦♦ 22k • written 9 months ago by Vivek.b90
1
gravatar for Martin Morgan
9 months ago by
Martin Morgan ♦♦ 22k
United States
Martin Morgan ♦♦ 22k wrote:

Does it help to use SnowParam() instead of MulticoreParam() ? Otherwise, I think the processes may be accessing shared global variables in an unhealthy way.

ADD COMMENTlink written 9 months ago by Martin Morgan ♦♦ 22k

Thanks Martin. I think the issue was the second one you mentioned. Today I killed some processes on my computer and freed a few GBs of memory, and the function is now working. So it needed more memory when executed multi-core than on single core. Weird coincidence for me, yesterday I tried on our Rstudio server, local server and on my computer and none of them had enough free memory yesterday afternoon to make it run..

ADD REPLYlink written 9 months ago by Vivek.b90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 381 users visited in the last hour