Deseq metadata bug on repeat runs with parallel
1
0
Entering edit mode
rbutler • 0
@rbutler-20667
Last seen 4.1 years ago

I have encountered a curious bug that I haven't seen posted yet. When DESeq is rerun multiple times, it seems to multiply the metadata fields by the square of the MulticoreParam. By the second or third time it significantly grows the file size and slows the run time, fourth time it errors out. Can be fixed by setting metadata(dds) = metadata(dds)[1] after the repeated DESeq run.

library(DESeq2)
library(BiocParallel)
register(MulticoreParam(100))

# load a dds where DESeq has been run once
> load(paste0(rootdir, '/comparisons/dsq_broad_analysis/', load_dds))
> dds
class: DESeqDataSet
dim: 20492 57
metadata(1): version
... # normal rest of the dds data stuff

> # do some stuff and rerun DESeq
> design(dds) = ~ cell + day + trt + day:trt
> dds = DESeq(dds, parallel=T)
using pre-existing normalization factors
... # this takes longer than the first time
final dispersion estimates, fitting model and testing: 100 workers
> dds
class: DESeqDataSet
dim: 20492 50
metadata(10000): version version ... version version #######seems a lot!?
... # normal rest of the dds data stuff

# Run again changing nothing
> dds = DESeq(dds, parallel=T)
using pre-existing normalization factors
... # this takes much longer than the second time, 10-15 minutes
final dispersion estimates, fitting model and testing: 100 workers
> dds
class: DESeqDataSet
dim: 20492 50 ######## this line actually hangs for 30 seconds
metadata(100000000): version version ... version version ####### and is now 10k^2
... # normal rest of the dds data stuff

# so what is it?
> head(metadata(dds))
$version
[1] ‘1.22.2’

$version
[1] ‘1.22.2’

$version
[1] ‘1.22.2’

$version
[1] ‘1.22.2’

$version
[1] ‘1.22.2’

$version
[1] ‘1.22.2’

# maybe its the parallel
> register(MulticoreParam(3))
> load(paste0(rootdir, '/comparisons/dsq_broad_analysis/', load_dds))
> dds = DESeq(dds, parallel=T)
using pre-existing normalization factors
...
final dispersion estimates, fitting model and testing: 3 workers
> dds
class: DESeqDataSet
dim: 20492 57
metadata(9): version version ... version version ########aha!
... # normal rest of the dds data stuff
R version 3.5.1 (2018-07-02)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: RHEL

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] GGally_1.4.0                sva_3.30.1
 [3] genefilter_1.64.0           mgcv_1.8-24
 [5] nlme_3.1-137                gridExtra_2.3
 [7] ggrepel_0.8.1               ggplot2_3.2.0
 [9] RColorBrewer_1.1-2          DESeq2_1.22.2
[11] SummarizedExperiment_1.12.0 DelayedArray_0.8.0
[13] BiocParallel_1.16.6         matrixStats_0.54.0
[15] Biobase_2.42.0              GenomicRanges_1.34.0
[17] GenomeInfoDb_1.18.2         IRanges_2.16.0
[19] S4Vectors_0.20.1            BiocGenerics_0.28.0

loaded via a namespace (and not attached):
 [1] bit64_0.9-7            splines_3.5.1          Formula_1.2-3
 [4] assertthat_0.2.1       BiocManager_1.30.4     latticeExtra_0.6-28
 [7] blob_1.2.0             GenomeInfoDbData_1.2.0 pillar_1.4.2
[10] RSQLite_2.1.1          backports_1.1.4        lattice_0.20-35
[13] limma_3.38.3           glue_1.3.1             digest_0.6.20
[16] XVector_0.22.0         checkmate_1.9.4        colorspace_1.4-1
[19] plyr_1.8.4             htmltools_0.3.6        Matrix_1.2-14
[22] XML_3.98-1.20          pkgconfig_2.0.2        zlibbioc_1.28.0
[25] purrr_0.3.2            xtable_1.8-4           scales_1.0.0
[28] htmlTable_1.13.1       tibble_2.1.3           annotate_1.60.1
[31] withr_2.1.2            nnet_7.3-12            lazyeval_0.2.2
[34] survival_2.42-3        magrittr_1.5           crayon_1.3.4
[37] memoise_1.1.0          foreign_0.8-70         tools_3.5.1
[40] data.table_1.12.2      stringr_1.4.0          locfit_1.5-9.1
[43] munsell_0.5.0          cluster_2.0.7-1        AnnotationDbi_1.44.0
[46] compiler_3.5.1         rlang_0.4.0            grid_3.5.1
[49] RCurl_1.95-4.12        rstudioapi_0.10        htmlwidgets_1.3
[52] bitops_1.0-6           tcltk_3.5.1            base64enc_0.1-3
[55] gtable_0.3.0           reshape_0.8.8          DBI_1.0.0
[58] R6_2.4.0               knitr_1.23             dplyr_0.8.3
[61] zeallot_0.1.0          bit_1.1-14             Hmisc_4.2-0
[64] stringi_1.4.3          Rcpp_1.0.1             geneplotter_1.60.0
[67] vctrs_0.2.0            rpart_4.1-13           acepack_1.4.1
[70] tidyselect_0.2.5       xfun_0.8
deseq2 • 636 views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

Thanks for this report. It's a SummarizedExperiment feature:

> metadata(se) <- list(version=1)
> metadata(se2) <- list(version=1)
> rbind(se,se2)
class: SummarizedExperiment 
dim: 0 0 
metadata(2): version version
assays(0):
rownames: NULL
rowData names(0):
colnames: NULL
colData names(0):

I'll come up with a fix for DESeq2's use of rbind tomorrow and push to devel branch.

ADD COMMENT
0
Entering edit mode

Thanks, I think I have a fix in devel to deal with rbind. It's v1.25.13.

Can you confirm that it looks good on your end?

install_github("mikelove/DESeq2")
ADD REPLY
0
Entering edit mode

That was fun. I installed several github bioconductor packages before realizing I should probably have installed the devel version of Bioconductor in one step. Developer I am not.

> library(BiocParallel)
> register(MulticoreParam(3))
> library("airway")
> data("airway")
> ddsSE = DESeqDataSet (airway, design = ~ cell + dex)
> ddsSE = DESeq(ddsSE, parallel=T)
> ddsSE
class: DESeqDataSet 
dim: 64102 8 
metadata(2): NA version
...

Good on this end.

ADD REPLY
0
Entering edit mode

Thanks. Oops I should have added dependencies=FALSE

ADD REPLY
0
Entering edit mode

Even without dependencies It still throws the error for minimum version for S4Vectors and BiocGenerics during compilation, then has issues with Iranges having an array sum error and more. It appears there are some big changes for 3.10?

ADD REPLY
0
Entering edit mode

It looks like you are a release behind? I think DESeq2 v1.24 and v1.25 are similar wrt objects.

ADD REPLY

Login before adding your answer.

Traffic: 871 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6