Question: BiocParallel and CopywriteR Error
gravatar for genomic8328
8 months ago by
genomic83280 wrote:

I recently tried to use CopywriteR in Microsoft Azure cloud - Windows Server Datacenter Virtual MachineĀ  (128 RAM and 16 cores) with R 3.3.2. Also my input data files: normal 12.67GB, tumor 11GB

I received the following error:
Error: 'bplapply' receive data failed: error reading from connection

Can you suggest a work around? Maybe too many bam lines are being read at once?

Here is my code:

data.folder <- tools::file_path_as_absolute(file.path(getwd()))
preCopywriteR(output.folder=file.path(data.folder), bin.size=20000, ref.genome="hg38", prefix="chr")

list.dirs(path=file.path(data.folder), full.names=FALSE)
list.files(path=file.path(data.folder, "hg38_20kb_chr"), full.names=FALSE)
load(file=file.path(data.folder, "hg38_20kb_chr", "blacklist.rda"))

load(file=file.path(data.folder, "hg38_20kb_chr", "GC_mappability.rda"))
bp.param <- SnowParam(workers = 15, type ="SOCK")

path <- c("C:/Users/m/Desktop/share/data")
samples <- list.files(path=path, pattern="tumor.bam$", full.names=TRUE)
controls <- list.files(path=path, pattern="normal.bam$", full.names=TRUE)
sample.control <- data.frame(samples,controls)

CopywriteR(sample.control = sample.control, destination.folder = file.path(data.folder), reference.folder = file.path(data.folder, "hg38_20kb_chr"), bp.param = bp.param)
ADD COMMENTlink modified 8 months ago by Martin Morgan ♦♦ 21k • written 8 months ago by genomic83280
gravatar for t.kuilman
8 months ago by
t.kuilman100 wrote:

I am not sure whether this is an issue with CopywriteR; I think this might be an issue with BiocParallel (the package in which the bplapply function is specified) and/or an memory issue. I hope someone else can help with this issue.

ADD COMMENTlink written 8 months ago by t.kuilman100
gravatar for Martin Morgan
8 months ago by
Martin Morgan ♦♦ 21k
United States
Martin Morgan ♦♦ 21k wrote:

My guess is that the amount of data being returned by workers is too large to be represented in a serialized vector, I think probably 2^31 - 1 elements. Maybe traceback() would help understand where things are going wrong, and using SerialParam() a work-around (though obviously thwarting parallel evaluation).

ADD COMMENTlink modified 8 months ago • written 8 months ago by Martin Morgan ♦♦ 21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 217 users visited in the last hour