DESeq2 Different Results when using different foreach parallel backends
1
0
Entering edit mode
brnleehng • 0
@brnleehng-14444
Last seen 7.1 years ago

 

Hi I'm currently testing out DESeq2 with doAzureParallel, a cloud-enabled parallel foreach backend. However, I am not getting identical results when running it in sequential and doParallel order. Both have the same sessionInfo(). In both cases, the base mean is the same. I would think the inputs for each foreach iteration would produce the same outputs for both cases. How does DESeq2 use the foreach package?

# normal non-parallel run

library("DESeq2")

directory <- system.file("extdata", package="pasilla", mustWork=TRUE)

sampleFiles <- grep("treated",list.files(directory),value=TRUE)

sampleCondition <- sub("(.*treated).*","\\1",sampleFiles)

sampleTable <- data.frame(sampleName = sampleFiles,fileName = sampleFiles,condition = sampleCondition)

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,directory = directory,design= ~ condition)

ddsHTSeq

dds <- DESeq(ddsHTSeq)

res <- results(dds)

# using BiocParallel default backend

library("BiocParallel")

multicoreParam <- MulticoreParam(workers = 8)

dds <- DESeq(ddsHTSeq,parallel = TRUE,BPPARAM = multicoreParam)

res.multicore <- results(dds,parallel = TRUE,BPPARAM = multicoreParam)

identical(res,res.multicore) # same results as original

# with doAzureParallel

library(doAzureParallel)

setCredentials("credentials.json")

pool <- makeCluster("cluster4.json")

registerDoAzureParallel(pool)

getDoParWorkers()

doparam <- DoparParam()

doparam

doparam$workers <- 8

dds <- DESeq(ddsHTSeq,parallel = TRUE,BPPARAM = doparam)

res.doAz <- results(dds,parallel = TRUE,BPPARAM = doparam)

identical(res,res.doAz) # different results than original

 

Output: No parallel backend 

                  baseMean log2FoldChange      lfcSE       stat    pvalue      padj

                 <numeric>      <numeric>  <numeric>  <numeric> <numeric> <numeric>

FBgn0000003:001  0.1561875    -0.02117133 0.08685755 -0.2437478 0.8074262        NA

FBgn0000008:001  0.0000000             NA         NA         NA        NA        NA

FBgn0000008:002  0.1876242     0.01712924 0.08685755  0.1972108 0.8436626        NA

Output: DoAzureParallel backend 

FBgn0000003:001  0.1561875     -0.9446086  3.814822 -0.2476154 0.8044320        NA

FBgn0000008:001  0.0000000             NA        NA         NA        NA        NA

FBgn0000008:002  0.1876242      0.7527265  3.816939  0.1972069 0.8436657        NA

 

Any suggestions? Solutions?

 

Thanks!

Brian

deseq2 biocparallel • 1.5k views
ADD COMMENT
0
Entering edit mode

I can confirm that it's the same version of DESeq2 now (v1.18.1). We are running it on the rocker/tidyverse container for all workers including the client. It seems strange that it's off by the 5th or 6th decimal point.

> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,directory = directory,design= ~ condition)

> ddsHTSeq

> dds <- DESeq(ddsHTSeq)

> res <- results(dds)

> library("BiocParallel")

> multicoreParam <- MulticoreParam(workers = 2)

> dds <- DESeq(ddsHTSeq,parallel = TRUE,BPPARAM = multicoreParam)

> res.multicore <- results(dds,parallel = TRUE,BPPARAM = multicoreParam)

> identical(res,res.multicore) # same results as original

[1] TRUE

> library(doAzureParallel)

> setCredentials(credentials)

> pool <- makeCluster(clusterConfiguration)

> registerDoAzureParallel(pool)

> getDoParWorkers()

[1] 4

> doparam <- DoparParam()

> doparam

class: DoparParam

  bpisup: TRUE; bpnworkers: 4; bptasks: 0; bpjobname: BPJOB

  bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE

  bptimeout: 2592000; bpprogressbar: FALSE

> doparam$workers <- 4

> doAzure.dds <- DESeq(ddsHTSeq,parallel = TRUE,BPPARAM = doparam)

> res.Azure <- results(doAzure.dds, parallel = TRUE, BPPARAM = doparam)

> identical(res, res.Azure) # same results as original

[1] FALSE

> sum(signif(res$pvalue,digits = 5)!=signif(res.Azure$pvalue,digits=5),na.rm=TRUE)

[1] 0

> sum(signif(res$pvalue,digits = 6)!=signif(res.Azure$pvalue,digits=6),na.rm=TRUE)

[1] 23



> sessionInfo() [Local worker]

R version 3.4.3 (2017-11-30)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Debian GNU/Linux 9 (stretch)


Matrix products: default

BLAS: /usr/lib/openblas-base/libblas.so.3

LAPACK: /usr/lib/libopenblasp-r0.2.19.so


locale:

[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C

[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8

[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C

[7] LC_PAPER=en_US.UTF-8       LC_NAME=C

[9] LC_ADDRESS=C               LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C


attached base packages:

[1] parallel  stats4    stats     graphics  grDevices utils     datasets

[8] methods   base


other attached packages:

[1] doAzureParallel_0.6.2      iterators_1.0.8

[3] foreach_1.4.3              BiocParallel_1.12.0

[5] DESeq2_1.18.1              SummarizedExperiment_1.8.0

[7] DelayedArray_0.4.1         matrixStats_0.52.2

[9] Biobase_2.38.0             GenomicRanges_1.30.0

[11] GenomeInfoDb_1.14.0        IRanges_2.12.0

[13] S4Vectors_0.16.0           BiocGenerics_0.24.0

[15] BiocInstaller_1.28.0

> FUN <- function(i) {

+   #installed.packages()["DESeq2",]

+   library(DESeq2)

+   sessionInfo()

+ }

> bplapply(1, FUN, BPPARAM = doparam)


[[1]] SessionInfo for doAzureParallel Rocker/Tidyverse container

R version 3.4.3 (2017-11-30)

Platform: x86_64-pc-linux-gnu (64-bit)

Running under: Debian GNU/Linux 9 (stretch)


Matrix products: default

BLAS: /usr/lib/openblas-base/libblas.so.3

LAPACK: /usr/lib/libopenblasp-r0.2.19.so


locale:

[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C

[3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8

[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C

[7] LC_PAPER=en_US.UTF-8       LC_NAME=C

[9] LC_ADDRESS=C               LC_TELEPHONE=C

[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C


attached base packages:

[1] parallel  stats4    methods   stats     graphics  grDevices utils

[8] datasets  base


other attached packages:

[1] DESeq2_1.18.1              SummarizedExperiment_1.8.0

[3] DelayedArray_0.4.1         matrixStats_0.52.2

[5] Biobase_2.38.0             GenomicRanges_1.30.0

[7] GenomeInfoDb_1.14.0        IRanges_2.12.0

[9] S4Vectors_0.16.0           BiocGenerics_0.24.0

[11] BiocParallel_1.12.0

 

Brian

ADD REPLY
0
Entering edit mode

I'm not sure exactly the cause here, I guess it could be further down the chain that there are different packages or libraries being used depending. I wouldn't worry too much about these differences in 6th significant digits for scientific purposes.

ADD REPLY
0
Entering edit mode

Maybe biocparallel should have an option and/or default to use doRNG?

https://cran.r-project.org/web/packages/doRNG/index.html

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

Can you confirm that the versions of DESeq2 are the same across all workers? It looks like some are using version < 1.16 where the default value was betaPrior=TRUE, where it is now FALSE.

ADD COMMENT

Login before adding your answer.

Traffic: 1248 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6