scater::calculatePCA error when using BPPARAM = MulticoreParam
Entering edit mode
doliv071 • 0
Last seen 3.8 years ago
United States

scater's runPCA/calculatePCA are unable to run parallel using MulticoreParam()

This error also leaves whatever cores have been started running and does not terminate them when the code errors out.

Is there a workaround of fix for this?

Thanks, Dave


> suppressPackageStartupMessages({
+     library(TENxPBMCData)
+     library(scater)
+     library(scuttle)
+     library(BiocParallel)
+ })
> tenx_pbmc3k <- TENxPBMCData(dataset="pbmc3k")
snapshotDate(): 2020-10-27
see ?TENxPBMCData and browseVignettes('TENxPBMCData') for documentation
loading from cache
> logcounts(tenx_pbmc3k) <- scuttle::normalizeCounts(x = tenx_pbmc3k, log = T)
> assayNames(tenx_pbmc3k)
[1] "counts"    "logcounts"
> assay(tenx_pbmc3k, "logcounts")
<32738 x 2700> matrix of class DelayedMatrix and type "double":
                   [,1]    [,2]    [,3] ... [,2699] [,2700]
ENSG00000243485       0       0       0   .       0       0
ENSG00000237613       0       0       0   .       0       0
ENSG00000186092       0       0       0   .       0       0
ENSG00000238009       0       0       0   .       0       0
ENSG00000239945       0       0       0   .       0       0
            ...       .       .       .   .       .       .
ENSG00000215635       0       0       0   .       0       0
ENSG00000268590       0       0       0   .       0       0
ENSG00000251180       0       0       0   .       0       0
ENSG00000215616       0       0       0   .       0       0
ENSG00000215611       0       0       0   .       0       0
> tenx_pbmc3k <- scater::runPCA(tenx_pbmc3k)
> reducedDims(tenx_pbmc3k)
List of length 1
names(1): PCA
> # this no worky
> tenx_pbmc3k <- scater::runPCA(tenx_pbmc3k, ncomponents = 50, 
+                               BPPARAM = MulticoreParam(6))
Error in serialize(data, node$con, xdr = FALSE) : 
  error writing to connection
Error in serialize(data, node$con, xdr = FALSE) : 
  error writing to connection
> traceback()
16: serialize(data, node$con, xdr = FALSE)
15: sendData.SOCK0node(backend[[node]], value)
14: parallel:::sendData(backend[[node]], value)
13: .send_to(cluster, i, .DONE())
12: .send_to(cluster, i, .DONE())
11: .bpstop_nodes(x)
10: .bpstop_impl(x)
9: bpstop(BPPARAM)
8: bpstop(BPPARAM)
7: .calculate_pca(mat, transposed = !is.null(dimred), ...)
6: .local(x, ...)
5: calculatePCA(y, ...)
4: calculatePCA(y, ...)
3: .local(x, ...)
2: scater::runPCA(tenx_pbmc3k, ncomponents = 50, BPPARAM = MulticoreParam(6))
1: scater::runPCA(tenx_pbmc3k, ncomponents = 50, BPPARAM = MulticoreParam(6))
> BiocManager::valid()
'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for details

replacement repositories:

* sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocParallel_1.24.1         scuttle_1.0.4               scater_1.18.6               ggplot2_3.3.3              
 [5] TENxPBMCData_1.8.0          HDF5Array_1.18.1            rhdf5_2.34.0                DelayedArray_0.16.3        
 [9] Matrix_1.3-4                SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0 Biobase_2.50.0             
[13] GenomicRanges_1.42.0        GenomeInfoDb_1.26.7         IRanges_2.24.1              S4Vectors_0.28.1           
[17] BiocGenerics_0.36.1         MatrixGenerics_1.2.1        matrixStats_0.59.0         

loaded via a namespace (and not attached):
 [1] viridis_0.6.1                 httr_1.4.2                    BiocSingular_1.6.0           
 [4] viridisLite_0.4.0             bit64_4.0.5                   AnnotationHub_2.22.1         
 [7] DelayedMatrixStats_1.12.3     shiny_1.6.0                   assertthat_0.2.1             
[10] interactiveDisplayBase_1.28.0 BiocManager_1.30.15           BiocFileCache_1.14.0         
[13] blob_1.2.1                    vipor_0.4.5                   GenomeInfoDbData_1.2.4       
[16] yaml_2.2.1                    BiocVersion_3.12.0            pillar_1.6.1                 
[19] RSQLite_2.2.7                 lattice_0.20-44               beachmat_2.6.4               
[22] glue_1.4.2                    digest_0.6.27                 promises_1.2.0.1             
[25] XVector_0.30.0                colorspace_2.0-1              htmltools_0.5.1.1            
[28] httpuv_1.6.1                  pkgconfig_2.0.3               zlibbioc_1.36.0              
[31] purrr_0.3.4                   xtable_1.8-4                  scales_1.1.1                 
[34] later_1.2.0                   tibble_3.1.2                  generics_0.1.0               
[37] ellipsis_0.3.2                cachem_1.0.5                  withr_2.4.2                  
[40] magrittr_2.0.1                crayon_1.4.1                  mime_0.10                    
[43] memoise_2.0.0                 fansi_0.5.0                   beeswarm_0.4.0               
[46] tools_4.0.3                   lifecycle_1.0.0               Rhdf5lib_1.12.1              
[49] munsell_0.5.0                 irlba_2.3.3                   AnnotationDbi_1.52.0         
[52] compiler_4.0.3                rsvd_1.0.5                    rlang_0.4.11                 
[55] grid_4.0.3                    RCurl_1.98-1.3                BiocNeighbors_1.8.2          
[58] rhdf5filters_1.2.1            rappdirs_0.3.3                bitops_1.0-7                 
[61] ExperimentHub_1.16.1          gtable_0.3.0                  DBI_1.1.1                    
[64] curl_4.3.1                    R6_2.5.0                      gridExtra_2.3                
[67] dplyr_1.0.6                   fastmap_1.1.0                 bit_4.0.4                    
[70] utf8_1.2.1                    ggbeeswarm_0.6.0              Rcpp_1.0.6                   
[73] vctrs_0.3.8                   sparseMatrixStats_1.2.1       dbplyr_2.1.1                 
[76] tidyselect_1.1.1             

Bioconductor version '3.12'

  * 0 packages out-of-date
  * 1 packages too new

create a valid installation with

  BiocManager::install("harmony", update = TRUE, ask = FALSE)

more details: BiocManager::valid()$too_new, BiocManager::valid()$out_of_date

Warning message:
0 packages out-of-date; 1 packages too new
calculatePCA BPPARAM scater BiocParallel runPCA • 1.8k views
Entering edit mode
Aaron Lun ★ 28k
Last seen 12 hours ago
The city by the bay

There was a bug in scater::runPCA relating to the handling of non-serial backends - see for details. This has since been fixed in the latest version.

Note, though, that scater::runPCA's default algorithm choice of IRLBA is quite slow for file-backed matrices like those from the TENxPBMCData package. You'd be better off using BSPARAM=BiocSingular::RandomParam() instead.

Entering edit mode

Thanks Aaron,

I used snowParam as a workaround for the time being.

Regarding file-backed matrices, I was under the impression that DelayedArray had it's own optimized matrix multiplication operator via DelayedMatrixStats. I will switch over to RandomParam, thanks for the advice and assistance.



Login before adding your answer.

Traffic: 878 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6