Hello,
I've a count table of 60668 genes x 217 samples.
Using only ~ group
takes ~10 minutes to finish. But using ~group + patient
it takes forever (Now it's running for more than 12 hours and is stuck in gene-wise dispersion estimates: 5 workers
In the colData, the patient column is defined as :
metadata$patient %>% table() %>% table()
1 2 3 4 5 6 8 15
120 22 5 1 1 1 1 1
Thus the number of samples per patient are different depending of the patient. 120 patient have only 1 sample, but (at the other extreme) 1 patient has 15 samples
Is this expected to take so long ?
Thanks
Here is my code :
dds <- DESeqDataSetFromMatrix(countData = counts,colData = metadata,design = ~ group + patient)
dds <- estimateSizeFactors(dds)
keep <- rowSums(counts(dds, normalized=TRUE) >= 10) >= 10 # min 10 samples with 10 reads
dds <- dds[keep,]
# only 37001 genes are kept
# multithread DESeq2
library("BiocParallel")
register(MulticoreParam(5))
dds <- DESeq(dds,parallel = T)
#
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 19.1
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=fr_BE.UTF-8 LC_NUMERIC=C LC_TIME=fr_BE.UTF-8 LC_COLLATE=fr_BE.UTF-8 LC_MONETARY=fr_BE.UTF-8
[6] LC_MESSAGES=fr_BE.UTF-8 LC_PAPER=fr_BE.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_BE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggrepel_0.8.1 pheatmap_1.0.12 RColorBrewer_1.1-2 cowplot_1.0.0 forcats_0.4.0
[6] stringr_1.4.0 dplyr_0.8.4 purrr_0.3.3 readr_1.3.1 tidyr_1.0.2
[11] tibble_2.1.3 ggplot2_3.2.1 tidyverse_1.3.0 DESeq2_1.26.0 SummarizedExperiment_1.16.1
[16] DelayedArray_0.12.2 BiocParallel_1.20.1 matrixStats_0.55.0 Biobase_2.46.0 GenomicRanges_1.38.0
[21] GenomeInfoDb_1.22.0 IRanges_2.20.2 S4Vectors_0.24.3 BiocGenerics_0.32.0
loaded via a namespace (and not attached):
[1] nlme_3.1-144 fs_1.3.1 bitops_1.0-6 lubridate_1.7.4 bit64_0.9-7 httr_1.4.1
[7] tools_3.6.2 backports_1.1.5 utf8_1.1.4 R6_2.4.1 rpart_4.1-15 Hmisc_4.3-1
[13] DBI_1.1.0 lazyeval_0.2.2 colorspace_1.4-1 nnet_7.3-12 withr_2.1.2 tidyselect_1.0.0
[19] gridExtra_2.3 bit_1.1-15.2 compiler_3.6.2 cli_2.0.1 rvest_0.3.5 htmlTable_1.13.3
[25] xml2_1.2.2 labeling_0.3 scales_1.1.0 checkmate_2.0.0 genefilter_1.68.0 digest_0.6.23
[31] foreign_0.8-75 XVector_0.26.0 base64enc_0.1-3 jpeg_0.1-8.1 pkgconfig_2.0.3 htmltools_0.4.0
[37] dbplyr_1.4.2 readxl_1.3.1 htmlwidgets_1.5.1 rlang_0.4.4 rstudioapi_0.11 RSQLite_2.2.0
[43] farver_2.0.3 generics_0.0.2 jsonlite_1.6.1 acepack_1.4.1 RCurl_1.98-1.1 magrittr_1.5
[49] GenomeInfoDbData_1.2.2 Formula_1.2-3 Matrix_1.2-18 fansi_0.4.1 Rcpp_1.0.3 munsell_0.5.0
[55] lifecycle_0.1.0 stringi_1.4.5 yaml_2.2.1 zlibbioc_1.32.0 grid_3.6.2 blob_1.2.1
[61] crayon_1.3.4 lattice_0.20-38 haven_2.2.0 splines_3.6.2 annotate_1.64.0 hms_0.5.3
[67] locfit_1.5-9.1 knitr_1.28 pillar_1.4.3 geneplotter_1.64.0 reprex_0.3.0 XML_3.99-0.3
[73] glue_1.3.1 latticeExtra_0.6-29 BiocManager_1.30.10 data.table_1.12.8 modelr_0.1.5 png_0.1-7
[79] vctrs_0.2.2 cellranger_1.1.0 gtable_0.3.0 assertthat_0.2.1 xfun_0.12 xtable_1.8-4
[85] broom_0.5.4 survival_3.1-8 AnnotationDbi_1.48.0 memoise_1.1.0 cluster_2.1.0 ellipsis_0.3.0
Also make sure you update to the latest version. You didn’t note your session info or version.
Thanks @Michael. I've the version 1.26 so the new speed optimization should be there. I will try without parallel