Hi I'm currently testing out DESeq2 with doAzureParallel, a cloud-enabled parallel foreach backend. However, I am not getting identical results when running it in sequential and doParallel order. Both have the same sessionInfo(). In both cases, the base mean is the same. I would think the inputs for each foreach iteration would produce the same outputs for both cases. How does DESeq2 use the foreach package?
# normal non-parallel run
library("DESeq2")
directory <- system.file("extdata", package="pasilla", mustWork=TRUE)
sampleFiles <- grep("treated",list.files(directory),value=TRUE)
sampleCondition <- sub("(.*treated).*","\\1",sampleFiles)
sampleTable <- data.frame(sampleName = sampleFiles,fileName = sampleFiles,condition = sampleCondition)
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,directory = directory,design= ~ condition)
ddsHTSeq
dds <- DESeq(ddsHTSeq)
res <- results(dds)
# using BiocParallel default backend
library("BiocParallel")
multicoreParam <- MulticoreParam(workers = 8)
dds <- DESeq(ddsHTSeq,parallel = TRUE,BPPARAM = multicoreParam)
res.multicore <- results(dds,parallel = TRUE,BPPARAM = multicoreParam)
identical(res,res.multicore) # same results as original
# with doAzureParallel
library(doAzureParallel)
setCredentials("credentials.json")
pool <- makeCluster("cluster4.json")
registerDoAzureParallel(pool)
getDoParWorkers()
doparam <- DoparParam()
doparam
doparam$workers <- 8
dds <- DESeq(ddsHTSeq,parallel = TRUE,BPPARAM = doparam)
res.doAz <- results(dds,parallel = TRUE,BPPARAM = doparam)
identical(res,res.doAz) # different results than original
Output: No parallel backend
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
FBgn0000003:001 0.1561875 -0.02117133 0.08685755 -0.2437478 0.8074262 NA
FBgn0000008:001 0.0000000 NA NA NA NA NA
FBgn0000008:002 0.1876242 0.01712924 0.08685755 0.1972108 0.8436626 NA
Output: DoAzureParallel backend
FBgn0000003:001 0.1561875 -0.9446086 3.814822 -0.2476154 0.8044320 NA
FBgn0000008:001 0.0000000 NA NA NA NA NA
FBgn0000008:002 0.1876242 0.7527265 3.816939 0.1972069 0.8436657 NA
Any suggestions? Solutions?
Thanks!
Brian

I can confirm that it's the same version of DESeq2 now (v1.18.1). We are running it on the rocker/tidyverse container for all workers including the client. It seems strange that it's off by the 5th or 6th decimal point.
> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,directory = directory,design= ~ condition)
> ddsHTSeq > dds <- DESeq(ddsHTSeq) > res <- results(dds) > library("BiocParallel") > multicoreParam <- MulticoreParam(workers = 2) > dds <- DESeq(ddsHTSeq,parallel = TRUE,BPPARAM = multicoreParam) > res.multicore <- results(dds,parallel = TRUE,BPPARAM = multicoreParam) > identical(res,res.multicore) # same results as original [1] TRUE > library(doAzureParallel) > setCredentials(credentials) > pool <- makeCluster(clusterConfiguration) > registerDoAzureParallel(pool) > getDoParWorkers() [1] 4 > doparam <- DoparParam() > doparam class: DoparParam bpisup: TRUE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE bptimeout: 2592000; bpprogressbar: FALSE > doparam$workers <- 4 > doAzure.dds <- DESeq(ddsHTSeq,parallel = TRUE,BPPARAM = doparam) > res.Azure <- results(doAzure.dds, parallel = TRUE, BPPARAM = doparam) > identical(res, res.Azure) # same results as original [1] FALSE > sum(signif(res$pvalue,digits = 5)!=signif(res.Azure$pvalue,digits=5),na.rm=TRUE) [1] 0 > sum(signif(res$pvalue,digits = 6)!=signif(res.Azure$pvalue,digits=6),na.rm=TRUE) [1] 23 > sessionInfo() [Local worker] R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 9 (stretch) Matrix products: default BLAS: /usr/lib/openblas-base/libblas.so.3 LAPACK: /usr/lib/libopenblasp-r0.2.19.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] doAzureParallel_0.6.2 iterators_1.0.8 [3] foreach_1.4.3 BiocParallel_1.12.0 [5] DESeq2_1.18.1 SummarizedExperiment_1.8.0 [7] DelayedArray_0.4.1 matrixStats_0.52.2 [9] Biobase_2.38.0 GenomicRanges_1.30.0 [11] GenomeInfoDb_1.14.0 IRanges_2.12.0 [13] S4Vectors_0.16.0 BiocGenerics_0.24.0 [15] BiocInstaller_1.28.0 > FUN <- function(i) { + #installed.packages()["DESeq2",] + library(DESeq2) + sessionInfo() + } > bplapply(1, FUN, BPPARAM = doparam) [[1]] SessionInfo for doAzureParallel Rocker/Tidyverse container R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 9 (stretch) Matrix products: default BLAS: /usr/lib/openblas-base/libblas.so.3 LAPACK: /usr/lib/libopenblasp-r0.2.19.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 methods stats graphics grDevices utils [8] datasets base other attached packages: [1] DESeq2_1.18.1 SummarizedExperiment_1.8.0 [3] DelayedArray_0.4.1 matrixStats_0.52.2 [5] Biobase_2.38.0 GenomicRanges_1.30.0 [7] GenomeInfoDb_1.14.0 IRanges_2.12.0 [9] S4Vectors_0.16.0 BiocGenerics_0.24.0 [11] BiocParallel_1.12.0Brian
I'm not sure exactly the cause here, I guess it could be further down the chain that there are different packages or libraries being used depending. I wouldn't worry too much about these differences in 6th significant digits for scientific purposes.
Maybe biocparallel should have an option and/or default to use doRNG?
https://cran.r-project.org/web/packages/doRNG/index.html