Question: DESeq2 Different Results when using different foreach parallel backends
0
gravatar for brnleehng
19 months ago by
brnleehng0
brnleehng0 wrote:

 

Hi I'm currently testing out DESeq2 with doAzureParallel, a cloud-enabled parallel foreach backend. However, I am not getting identical results when running it in sequential and doParallel order. Both have the same sessionInfo(). In both cases, the base mean is the same. I would think the inputs for each foreach iteration would produce the same outputs for both cases. How does DESeq2 use the foreach package?

# normal non-parallel run

library("DESeq2")

directory <- system.file("extdata", package="pasilla", mustWork=TRUE)

sampleFiles <- grep("treated",list.files(directory),value=TRUE)

sampleCondition <- sub("(.*treated).*","\\1",sampleFiles)

sampleTable <- data.frame(sampleName = sampleFiles,fileName = sampleFiles,condition = sampleCondition)

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,directory = directory,design= ~ condition)

ddsHTSeq

dds <- DESeq(ddsHTSeq)

res <- results(dds)

# using BiocParallel default backend

library("BiocParallel")

multicoreParam <- MulticoreParam(workers = 8)

dds <- DESeq(ddsHTSeq,parallel = TRUE,BPPARAM = multicoreParam)

res.multicore <- results(dds,parallel = TRUE,BPPARAM = multicoreParam)

identical(res,res.multicore) # same results as original

# with doAzureParallel

library(doAzureParallel)

setCredentials("credentials.json")

pool <- makeCluster("cluster4.json")

registerDoAzureParallel(pool)

getDoParWorkers()

doparam <- DoparParam()

doparam

doparam$workers <- 8

dds <- DESeq(ddsHTSeq,parallel = TRUE,BPPARAM = doparam)

res.doAz <- results(dds,parallel = TRUE,BPPARAM = doparam)

identical(res,res.doAz) # different results than original

 

Output: No parallel backend 

                  baseMean log2FoldChange      lfcSE       stat    pvalue      padj

                 <numeric>      <numeric>  <numeric>  <numeric> <numeric> <numeric>

FBgn0000003:001  0.1561875    -0.02117133 0.08685755 -0.2437478 0.8074262        NA

FBgn0000008:001  0.0000000             NA         NA         NA        NA        NA

FBgn0000008:002  0.1876242     0.01712924 0.08685755  0.1972108 0.8436626        NA

Output: DoAzureParallel backend 

FBgn0000003:001  0.1561875     -0.9446086  3.814822 -0.2476154 0.8044320        NA

FBgn0000008:001  0.0000000             NA        NA         NA        NA        NA

FBgn0000008:002  0.1876242      0.7527265  3.816939  0.1972069 0.8436657        NA

 

Any suggestions? Solutions?

 

Thanks!

Brian

deseq2 biocparallel • 440 views
ADD COMMENTlink modified 18 months ago • written 19 months ago by brnleehng0

I can confirm that it's the same version of DESeq2 now (v1.18.1). We are running it on the rocker/tidyverse container for all workers including the client. It seems strange that it's off by the 5th or 6th decimal point.

> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,directory = directory,design= ~ condition)

> ddsHTSeq

> dds <- DESeq(ddsHTSeq)

> res <- results(dds)

> library("BiocParallel")

> multicoreParam <- MulticoreParam(workers = 2)

> dds <- DESeq(ddsHTSeq,parallel = TRUE,BPPARAM = multicoreParam)

> res.multicore <- results(dds,parallel = TRUE,BPPARAM = multicoreParam)

> identical(res,res.multicore) # same results as original

[1] TRUE

> library(doAzureParallel)

> setCredentials(credentials)

> pool <- makeCluster(clusterConfiguration)

> registerDoAzureParallel(pool)

> getDoParWorkers()

[1] 4

> doparam <- DoparParam()

> doparam

class: DoparParam

  bpisup: TRUE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB

  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE

  bptimeout: 2592000; bpprogressbar: FALSE

> doparam$workers <- 4

> doAzure.dds <- DESeq(ddsHTSeq,parallel = TRUE,BPPARAM = doparam)

> res.Azure <- results(doAzure.dds, parallel = TRUE, BPPARAM = doparam)

> identical(res, res.Azure) # same results as original

[1] FALSE

> sum(signif(res$pvalue,digits = 5)!=signif(res.Azure$pvalue,digits=5),na.rm=TRUE)

[1] 0

> sum(signif(res$pvalue,digits = 6)!=signif(res.Azure$pvalue,digits=6),na.rm=TRUE)

[1] 23



> sessionInfo() [Local worker]

R version 3.4.3 (2017-11-30)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Debian GNU/Linux 9 (stretch)


Matrix products: default

BLAS: /usr/lib/openblas-base/libblas.so.3

LAPACK: /usr/lib/libopenblasp-r0.2.19.so


locale:

[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C

[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8

[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C

[7] LC_PAPER=en_US.UTF-8       LC_NAME=C

[9] LC_ADDRESS=C               LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C


attached base packages:

[1] parallel  stats4    stats     graphics  grDevices utils     datasets

[8] methods   base


other attached packages:

[1] doAzureParallel_0.6.2      iterators_1.0.8

[3] foreach_1.4.3              BiocParallel_1.12.0

[5] DESeq2_1.18.1              SummarizedExperiment_1.8.0

[7] DelayedArray_0.4.1         matrixStats_0.52.2

[9] Biobase_2.38.0             GenomicRanges_1.30.0

[11] GenomeInfoDb_1.14.0        IRanges_2.12.0

[13] S4Vectors_0.16.0           BiocGenerics_0.24.0

[15] BiocInstaller_1.28.0

> FUN <- function(i) {

+   #installed.packages()["DESeq2",]

+   library(DESeq2)

+   sessionInfo()

+ }

> bplapply(1, FUN, BPPARAM = doparam)


[[1]] SessionInfo for doAzureParallel Rocker/Tidyverse container

R version 3.4.3 (2017-11-30)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Debian GNU/Linux 9 (stretch)


Matrix products: default

BLAS: /usr/lib/openblas-base/libblas.so.3

LAPACK: /usr/lib/libopenblasp-r0.2.19.so


locale:

[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C

[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8

[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C

[7] LC_PAPER=en_US.UTF-8       LC_NAME=C

[9] LC_ADDRESS=C               LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C


attached base packages:

[1] parallel  stats4    methods   stats     graphics  grDevices utils

[8] datasets  base


other attached packages:

[1] DESeq2_1.18.1              SummarizedExperiment_1.8.0

[3] DelayedArray_0.4.1         matrixStats_0.52.2

[5] Biobase_2.38.0             GenomicRanges_1.30.0

[7] GenomeInfoDb_1.14.0        IRanges_2.12.0

[9] S4Vectors_0.16.0           BiocGenerics_0.24.0

[11] BiocParallel_1.12.0

 

Brian

ADD REPLYlink written 18 months ago by brnleehng0

I'm not sure exactly the cause here, I guess it could be further down the chain that there are different packages or libraries being used depending. I wouldn't worry too much about these differences in 6th significant digits for scientific purposes.

ADD REPLYlink written 18 months ago by Michael Love24k

Maybe biocparallel should have an option and/or default to use doRNG?

https://cran.r-project.org/web/packages/doRNG/index.html

ADD REPLYlink written 18 months ago by Ryan C. Thompson7.3k
Answer: DESeq2 Different Results when using different foreach parallel backends
0
gravatar for Michael Love
19 months ago by
Michael Love24k
United States
Michael Love24k wrote:

Can you confirm that the versions of DESeq2 are the same across all workers? It looks like some are using version < 1.16 where the default value was betaPrior=TRUE, where it is now FALSE.

ADD COMMENTlink written 19 months ago by Michael Love24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 146 users visited in the last hour