Hi,
I am running (to me) identical code for some clustering but despite fixed seeds it is inconsistent when using either lapply or bplapply:
Any idea what is happening?
suppressMessages({
library(BiocParallel)
library(bluster)
library(scran)
library(scuttle)
})
# Mock up some data
set.seed(1)
sce <- logNormCounts(mockSCE())
set.seed(1)
sce <- scater::runPCA(sce)
blusparam <- bluster::SNNGraphParam(k=20, cluster.fun="louvain")
# Get clusters
get_clusters <- function(){
set.seed(1)
d <- scran::clusterCells(sce, use.dimred="PCA", BLUSPARAM=blusparam)
nlevels(factor(d))
}
# lapply 4 clusters, bplapply 5 clusters
la <- lapply(1:10, function(x) get_clusters())
unique(unlist(la))
#> [1] 4
bp <- bplapply(1:10, function(x) get_clusters())
unique(unlist(bp))
#> [1] 5
sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats4 stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] scuttle_1.12.0 SingleCellExperiment_1.24.0
#> [3] SummarizedExperiment_1.32.0 Biobase_2.62.0
#> [5] GenomicRanges_1.54.1 GenomeInfoDb_1.38.1
#> [7] IRanges_2.36.0 S4Vectors_0.40.2
#> [9] BiocGenerics_0.48.1 MatrixGenerics_1.14.0
#> [11] matrixStats_1.1.0 bluster_1.12.0
#> [13] BiocParallel_1.36.0
#>
#> loaded via a namespace (and not attached):
#> [1] tidyselect_1.2.0 viridisLite_0.4.2
#> [3] dplyr_1.1.4 vipor_0.4.5
#> [5] viridis_0.6.4 bitops_1.0-7
#> [7] fastmap_1.1.1 RCurl_1.98-1.13
#> [9] reprex_2.0.2 digest_0.6.33
#> [11] rsvd_1.0.5 lifecycle_1.0.4
#> [13] cluster_2.1.4 statmod_1.5.0
#> [15] magrittr_2.0.3 compiler_4.3.2
#> [17] rlang_1.1.2 tools_4.3.2
#> [19] igraph_1.5.1 utf8_1.2.4
#> [21] yaml_2.3.7 knitr_1.45
#> [23] S4Arrays_1.2.0 dqrng_0.3.2
#> [25] DelayedArray_0.28.0 abind_1.4-5
#> [27] withr_2.5.2 grid_4.3.2
#> [29] fansi_1.0.5 beachmat_2.18.0
#> [31] colorspace_2.1-0 edgeR_4.0.2
#> [33] ggplot2_3.4.4 scales_1.3.0
#> [35] cli_3.6.1 rmarkdown_2.25
#> [37] crayon_1.5.2 generics_0.1.3
#> [39] metapod_1.10.0 rstudioapi_0.15.0
#> [41] DelayedMatrixStats_1.24.0 ggbeeswarm_0.7.2
#> [43] zlibbioc_1.48.0 parallel_4.3.2
#> [45] XVector_0.42.0 vctrs_0.6.5
#> [47] Matrix_1.6-4 BiocSingular_1.18.0
#> [49] BiocNeighbors_1.20.0 ggrepel_0.9.4
#> [51] irlba_2.3.5.1 beeswarm_0.4.0
#> [53] scater_1.30.1 locfit_1.5-9.8
#> [55] limma_3.58.1 glue_1.6.2
#> [57] codetools_0.2-19 gtable_0.3.4
#> [59] ScaledMatrix_1.10.0 munsell_0.5.0
#> [61] tibble_3.2.1 pillar_1.9.0
#> [63] htmltools_0.5.7 GenomeInfoDbData_1.2.11
#> [65] R6_2.5.1 sparseMatrixStats_1.14.0
#> [67] evaluate_0.23 lattice_0.22-5
#> [69] scran_1.30.0 Rcpp_1.0.11
#> [71] gridExtra_2.3 SparseArray_1.2.2
#> [73] xfun_0.41 fs_1.6.3
#> [75] pkgconfig_2.0.3
Created on 2024-04-24 with reprex v2.0.2
Thanks Aaron for the response. RNGseed (for me) has no effect. I actually stumbled over this in one of my analysis where I have a constant
bpparam
definition that always has a fixed seed to exactly avoid these sorts of things. If you run this below the issue persists: