Question

DESeq2 Parallel Computing Stall on Server

0

Entering edit mode

coyoung ▴ 10

@coyoung-17963

Last seen 5.4 years ago

I have a question regarding the use of parallel computing for DESeq2 on the server. I’ve been working with this code for the last few days & I haven’t been able to find a remedy. I have been using the DEseq2 online manual for RNA seq analysis but it seems the processing is taking longer than what I have been able to do on my personal computer. Please let me know if there is anything else I need to send.

$ screen -S DESeq2


### DESeq2 Analysis in Server ###

setwd('/projects/home/cyoung304/data')

library('DESeq2')

cts <- read.csv(file='GeneName.csv')

nrow(cts)

ncol(cts)

colData <- read.csv(file='colData.csv')

ncol(colData)

nrow(colData)

rownames(cts) <- cts$Geneid

cts$Geneid <- NULL

library("BiocParallel")

register(MulticoreParam(12))

dds <- DESeqDataSetFromMatrix(countData=cts, colData=colData, design= ~ patient + condition)

keep <- rowSums(counts(dds) >= 10) >= 5

dds <- dds[keep,]

dds$condition <- relevel(dds$condition, ref='NT')

# micheal said it doesnt matter what the refernce is b/c the comparision between the sample remains the same

ddsColl <- collapseReplicates(dds, dds$id)

ddsColl <- DESeq(ddsColl, fitType='local', parallel = TRUE)

resultsNames(ddsColl)

res <- results(dds, name="condition_NT_vs_MPT", lfcThreshold = 0.585, alpha=0.05)

res

resOrder <- res[order(res$padj),]

write.csv(as.data.frame(resOrder), file='DESeqResults.csv')

deseq2 • 1.3k views

ADD COMMENT • link updated 6.5 years ago by Michael Love 43k • written 6.5 years ago by coyoung ▴ 10

0

Entering edit mode

make sure that the basics are working first, e.g.,

register(MulticoreParam(2))
bplapply(1:2, sqrt)

If not check out the manager.hostname and manager.port arguments to MulticoreParam (this could be tricky, finding out what ports (if any) are open on the cluster.

ADD REPLY • link 6.5 years ago Martin Morgan 25k

0

Entering edit mode

library("BiocParallel")

register(MulticoreParam(2))

bplapply(1:2, sqrt)

[[1]]

[1] 1

[[2]]

[1] 1.414214

ADD REPLY • link 6.5 years ago coyoung ▴ 10

score 0 · Answer 1 · 2019-01-02

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 10 days ago

United States

How many samples do you have? How long does it take with one core?

Due to the way that the parallel backends work, it's sometimes faster to run with fewer cores, as there is overhead in sending large datasets to 12 cores, and R will often eventually end up duplicating the memory (long backstory on this, which can be found on support site threads).

ADD COMMENT • link 6.5 years ago Michael Love 43k

0

Entering edit mode

I have 125 samples total (68 normal/57 condition). Without registering the cores at the beginning of the analysis I was able to complete DESeq2 in about 3.5 - 4 hrs.

ADD REPLY • link 6.5 years ago coyoung ▴ 10

0

Entering edit mode

The reason it is slow is because the design matrix is 125 x ~60 with every patient getting their own coefficient, which is pretty large, and the GLM needs to be iteratively solved for each gene. For these large datasets, I tend to use limma-voom which is much faster, because it avoids the need to iteratively solve for the coefficients with these large design matrices.

ADD REPLY • link 6.5 years ago Michael Love 43k