Hi I'm currently testing out DESeq2 with doAzureParallel, a cloud-enabled parallel foreach backend. However, I am not getting identical results when running it in sequential and doParallel order. Both have the same sessionInfo(). In both cases, the base mean is the same. I would think the inputs for each foreach iteration would produce the same outputs for both cases. How does DESeq2 use the foreach package?
# normal non-parallel run
library("DESeq2")
directory <- system.file("extdata", package="pasilla", mustWork=TRUE)
sampleFiles <- grep("treated",list.files(directory),value=TRUE)
sampleCondition <- sub("(.*treated).*","\\1",sampleFiles)
sampleTable <- data.frame(sampleName = sampleFiles,fileName = sampleFiles,condition = sampleCondition)
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,directory = directory,design= ~ condition)
ddsHTSeq
dds <- DESeq(ddsHTSeq)
res <- results(dds)
# using BiocParallel default backend
library("BiocParallel")
multicoreParam <- MulticoreParam(workers = 8)
dds <- DESeq(ddsHTSeq,parallel = TRUE,BPPARAM = multicoreParam)
res.multicore <- results(dds,parallel = TRUE,BPPARAM = multicoreParam)
identical(res,res.multicore) # same results as original
# with doAzureParallel
library(doAzureParallel)
setCredentials("credentials.json")
pool <- makeCluster("cluster4.json")
registerDoAzureParallel(pool)
getDoParWorkers()
doparam <- DoparParam()
doparam
doparam$workers <- 8
dds <- DESeq(ddsHTSeq,parallel = TRUE,BPPARAM = doparam)
res.doAz <- results(dds,parallel = TRUE,BPPARAM = doparam)
identical(res,res.doAz) # different results than original
Output: No parallel backend
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
FBgn0000003:001 0.1561875 -0.02117133 0.08685755 -0.2437478 0.8074262 NA
FBgn0000008:001 0.0000000 NA NA NA NA NA
FBgn0000008:002 0.1876242 0.01712924 0.08685755 0.1972108 0.8436626 NA
Output: DoAzureParallel backend
FBgn0000003:001 0.1561875 -0.9446086 3.814822 -0.2476154 0.8044320 NA
FBgn0000008:001 0.0000000 NA NA NA NA NA
FBgn0000008:002 0.1876242 0.7527265 3.816939 0.1972069 0.8436657 NA
Any suggestions? Solutions?
Thanks!
Brian
I can confirm that it's the same version of DESeq2 now (v1.18.1). We are running it on the rocker/tidyverse container for all workers including the client. It seems strange that it's off by the 5th or 6th decimal point.
> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,directory = directory,design= ~ condition)
Brian
I'm not sure exactly the cause here, I guess it could be further down the chain that there are different packages or libraries being used depending. I wouldn't worry too much about these differences in 6th significant digits for scientific purposes.
Maybe biocparallel should have an option and/or default to use doRNG?
https://cran.r-project.org/web/packages/doRNG/index.html