bplapply with ShortRead functions
1
0
Entering edit mode
Vivek.b ▴ 100
@vivekb-7661
Last seen 4.6 years ago
Germany

Hey everyone

I am facing an issue with using ShortRead functions with bplapply. I am trying to demultiplex a fastq file. Following is an example function, which does nothing but creates multiple paired-end fastq files, all containing same reads :

demultiplex_fastq <- function(fastq_R1, fastq_R2, destinations, outdir, ncores = 1) {

    param = BiocParallel::MulticoreParam(workers = ncores)
    message("de-multiplexing the FASTQ file")
    ## filter and write
    info <- BiocParallel::bplapply(seq_along(destinations), function(i){
        split1 <- file.path(outdir, paste0(destinations[i],"_R1.fastq.gz"))
        split2 <- file.path(outdir, paste0(destinations[i],"_R2.fastq.gz"))
        print(split1)
        print(split2)
        ## open input stream
        stream_R1 <- ShortRead::FastqStreamer(fastq_R1)
        stream_R2 <- ShortRead::FastqStreamer(fastq_R2)
        on.exit(close(stream_R1))
        on.exit(close(stream_R2), add = TRUE)
        repeat {
            fq_R1 <- ShortRead::yield(stream_R1)
            fq_R2 <- ShortRead::yield(stream_R2)
            if (length(fq_R1) == 0) {
                break
            }
            id2keep <- 1:10
            ShortRead::writeFastq(fq_R1[id2keep], split1, "a")
            ShortRead::writeFastq(fq_R2[id2keep], split2, "a")
        }
        return("Done!")
    }, BPPARAM = param)

    return("Done!")
}

This does work when I use single core (writes the files), but gets stuck when I use >1 cores. Can anyone point out what's the issue here.

Thanks in advance

My sessionInfo()

R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.1 tools_3.4.1    yaml_2.1.14
shortread biocparallel • 1.5k views
ADD COMMENT
1
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States

Does it help to use SnowParam() instead of MulticoreParam() ? Otherwise, I think the processes may be accessing shared global variables in an unhealthy way.

ADD COMMENT
0
Entering edit mode

Thanks Martin. I think the issue was the second one you mentioned. Today I killed some processes on my computer and freed a few GBs of memory, and the function is now working. So it needed more memory when executed multi-core than on single core. Weird coincidence for me, yesterday I tried on our Rstudio server, local server and on my computer and none of them had enough free memory yesterday afternoon to make it run..

ADD REPLY

Login before adding your answer.

Traffic: 422 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6