Question: Deseq metadata bug on repeat runs with parallel
0
gravatar for rbutler
7 weeks ago by
rbutler0
rbutler0 wrote:

I have encountered a curious bug that I haven't seen posted yet. When DESeq is rerun multiple times, it seems to multiply the metadata fields by the square of the MulticoreParam. By the second or third time it significantly grows the file size and slows the run time, fourth time it errors out. Can be fixed by setting metadata(dds) = metadata(dds)[1] after the repeated DESeq run.

library(DESeq2)
library(BiocParallel)
register(MulticoreParam(100))

# load a dds where DESeq has been run once
> load(paste0(rootdir, '/comparisons/dsq_broad_analysis/', load_dds))
> dds
class: DESeqDataSet
dim: 20492 57
metadata(1): version
... # normal rest of the dds data stuff

> # do some stuff and rerun DESeq
> design(dds) = ~ cell + day + trt + day:trt
> dds = DESeq(dds, parallel=T)
using pre-existing normalization factors
... # this takes longer than the first time
final dispersion estimates, fitting model and testing: 100 workers
> dds
class: DESeqDataSet
dim: 20492 50
metadata(10000): version version ... version version #######seems a lot!?
... # normal rest of the dds data stuff

# Run again changing nothing
> dds = DESeq(dds, parallel=T)
using pre-existing normalization factors
... # this takes much longer than the second time, 10-15 minutes
final dispersion estimates, fitting model and testing: 100 workers
> dds
class: DESeqDataSet
dim: 20492 50 ######## this line actually hangs for 30 seconds
metadata(100000000): version version ... version version ####### and is now 10k^2
... # normal rest of the dds data stuff

# so what is it?
> head(metadata(dds))
$version
[1] ‘1.22.2’

$version
[1] ‘1.22.2’

$version
[1] ‘1.22.2’

$version
[1] ‘1.22.2’

$version
[1] ‘1.22.2’

$version
[1] ‘1.22.2’

# maybe its the parallel
> register(MulticoreParam(3))
> load(paste0(rootdir, '/comparisons/dsq_broad_analysis/', load_dds))
> dds = DESeq(dds, parallel=T)
using pre-existing normalization factors
...
final dispersion estimates, fitting model and testing: 3 workers
> dds
class: DESeqDataSet
dim: 20492 57
metadata(9): version version ... version version ########aha!
... # normal rest of the dds data stuff
R version 3.5.1 (2018-07-02)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: RHEL

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] GGally_1.4.0                sva_3.30.1
 [3] genefilter_1.64.0           mgcv_1.8-24
 [5] nlme_3.1-137                gridExtra_2.3
 [7] ggrepel_0.8.1               ggplot2_3.2.0
 [9] RColorBrewer_1.1-2          DESeq2_1.22.2
[11] SummarizedExperiment_1.12.0 DelayedArray_0.8.0
[13] BiocParallel_1.16.6         matrixStats_0.54.0
[15] Biobase_2.42.0              GenomicRanges_1.34.0
[17] GenomeInfoDb_1.18.2         IRanges_2.16.0
[19] S4Vectors_0.20.1            BiocGenerics_0.28.0

loaded via a namespace (and not attached):
 [1] bit64_0.9-7            splines_3.5.1          Formula_1.2-3
 [4] assertthat_0.2.1       BiocManager_1.30.4     latticeExtra_0.6-28
 [7] blob_1.2.0             GenomeInfoDbData_1.2.0 pillar_1.4.2
[10] RSQLite_2.1.1          backports_1.1.4        lattice_0.20-35
[13] limma_3.38.3           glue_1.3.1             digest_0.6.20
[16] XVector_0.22.0         checkmate_1.9.4        colorspace_1.4-1
[19] plyr_1.8.4             htmltools_0.3.6        Matrix_1.2-14
[22] XML_3.98-1.20          pkgconfig_2.0.2        zlibbioc_1.28.0
[25] purrr_0.3.2            xtable_1.8-4           scales_1.0.0
[28] htmlTable_1.13.1       tibble_2.1.3           annotate_1.60.1
[31] withr_2.1.2            nnet_7.3-12            lazyeval_0.2.2
[34] survival_2.42-3        magrittr_1.5           crayon_1.3.4
[37] memoise_1.1.0          foreign_0.8-70         tools_3.5.1
[40] data.table_1.12.2      stringr_1.4.0          locfit_1.5-9.1
[43] munsell_0.5.0          cluster_2.0.7-1        AnnotationDbi_1.44.0
[46] compiler_3.5.1         rlang_0.4.0            grid_3.5.1
[49] RCurl_1.95-4.12        rstudioapi_0.10        htmlwidgets_1.3
[52] bitops_1.0-6           tcltk_3.5.1            base64enc_0.1-3
[55] gtable_0.3.0           reshape_0.8.8          DBI_1.0.0
[58] R6_2.4.0               knitr_1.23             dplyr_0.8.3
[61] zeallot_0.1.0          bit_1.1-14             Hmisc_4.2-0
[64] stringi_1.4.3          Rcpp_1.0.1             geneplotter_1.60.0
[67] vctrs_0.2.0            rpart_4.1-13           acepack_1.4.1
[70] tidyselect_0.2.5       xfun_0.8
deseq2 • 79 views
ADD COMMENTlink modified 7 weeks ago by Michael Love26k • written 7 weeks ago by rbutler0
Answer: Deseq metadata bug on repeat runs with parallel
2
gravatar for Michael Love
7 weeks ago by
Michael Love26k
United States
Michael Love26k wrote:

Thanks for this report. It's a SummarizedExperiment feature:

> metadata(se) <- list(version=1)
> metadata(se2) <- list(version=1)
> rbind(se,se2)
class: SummarizedExperiment 
dim: 0 0 
metadata(2): version version
assays(0):
rownames: NULL
rowData names(0):
colnames: NULL
colData names(0):

I'll come up with a fix for DESeq2's use of rbind tomorrow and push to devel branch.

ADD COMMENTlink written 7 weeks ago by Michael Love26k

Thanks, I think I have a fix in devel to deal with rbind. It's v1.25.13.

Can you confirm that it looks good on your end?

install_github("mikelove/DESeq2")
ADD REPLYlink written 7 weeks ago by Michael Love26k

That was fun. I installed several github bioconductor packages before realizing I should probably have installed the devel version of Bioconductor in one step. Developer I am not.

> library(BiocParallel)
> register(MulticoreParam(3))
> library("airway")
> data("airway")
> ddsSE = DESeqDataSet (airway, design = ~ cell + dex)
> ddsSE = DESeq(ddsSE, parallel=T)
> ddsSE
class: DESeqDataSet 
dim: 64102 8 
metadata(2): NA version
...

Good on this end.

ADD REPLYlink written 6 weeks ago by rbutler0

Thanks. Oops I should have added dependencies=FALSE

ADD REPLYlink written 6 weeks ago by Michael Love26k

Even without dependencies It still throws the error for minimum version for S4Vectors and BiocGenerics during compilation, then has issues with Iranges having an array sum error and more. It appears there are some big changes for 3.10?

ADD REPLYlink written 6 weeks ago by rbutler0

It looks like you are a release behind? I think DESeq2 v1.24 and v1.25 are similar wrt objects.

ADD REPLYlink written 6 weeks ago by Michael Love26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 394 users visited in the last hour