scater::calculatePCA error when using BPPARAM = MulticoreParam
1
0
Entering edit mode
doliv071 • 0
@doliv071-8144
Last seen 3.5 years ago
United States

scater's runPCA/calculatePCA are unable to run parallel using MulticoreParam()

This error also leaves whatever cores have been started running and does not terminate them when the code errors out.

Is there a workaround of fix for this?

Thanks, Dave

# BEGIN REPREX #

> suppressPackageStartupMessages({
+     library(TENxPBMCData)
+     library(scater)
+     library(scuttle)
+     library(BiocParallel)
+ })
> tenx_pbmc3k <- TENxPBMCData(dataset="pbmc3k")
snapshotDate(): 2020-10-27
see ?TENxPBMCData and browseVignettes('TENxPBMCData') for documentation
loading from cache
> logcounts(tenx_pbmc3k) <- scuttle::normalizeCounts(x = tenx_pbmc3k, log = T)
> assayNames(tenx_pbmc3k)
[1] "counts"    "logcounts"
> assay(tenx_pbmc3k, "logcounts")
<32738 x 2700> matrix of class DelayedMatrix and type "double":
                   [,1]    [,2]    [,3] ... [,2699] [,2700]
ENSG00000243485       0       0       0   .       0       0
ENSG00000237613       0       0       0   .       0       0
ENSG00000186092       0       0       0   .       0       0
ENSG00000238009       0       0       0   .       0       0
ENSG00000239945       0       0       0   .       0       0
            ...       .       .       .   .       .       .
ENSG00000215635       0       0       0   .       0       0
ENSG00000268590       0       0       0   .       0       0
ENSG00000251180       0       0       0   .       0       0
ENSG00000215616       0       0       0   .       0       0
ENSG00000215611       0       0       0   .       0       0
> tenx_pbmc3k <- scater::runPCA(tenx_pbmc3k)
> reducedDims(tenx_pbmc3k)
List of length 1
names(1): PCA
> # this no worky
> tenx_pbmc3k <- scater::runPCA(tenx_pbmc3k, ncomponents = 50, 
+                               BPPARAM = MulticoreParam(6))
Error in serialize(data, node$con, xdr = FALSE) : 
  error writing to connection
Error in serialize(data, node$con, xdr = FALSE) : 
  error writing to connection
> traceback()
16: serialize(data, node$con, xdr = FALSE)
15: sendData.SOCK0node(backend[[node]], value)
14: parallel:::sendData(backend[[node]], value)
13: .send_to(cluster, i, .DONE())
12: .send_to(cluster, i, .DONE())
11: .bpstop_nodes(x)
10: .bpstop_impl(x)
9: bpstop(BPPARAM)
8: bpstop(BPPARAM)
7: .calculate_pca(mat, transposed = !is.null(dimred), ...)
6: .local(x, ...)
5: calculatePCA(y, ...)
4: calculatePCA(y, ...)
3: .local(x, ...)
2: scater::runPCA(tenx_pbmc3k, ncomponents = 50, BPPARAM = MulticoreParam(6))
1: scater::runPCA(tenx_pbmc3k, ncomponents = 50, BPPARAM = MulticoreParam(6))
> BiocManager::valid()
'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for details

replacement repositories:
    CRAN: https://cloud.r-project.org


* sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libflexiblas.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocParallel_1.24.1         scuttle_1.0.4               scater_1.18.6               ggplot2_3.3.3              
 [5] TENxPBMCData_1.8.0          HDF5Array_1.18.1            rhdf5_2.34.0                DelayedArray_0.16.3        
 [9] Matrix_1.3-4                SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0 Biobase_2.50.0             
[13] GenomicRanges_1.42.0        GenomeInfoDb_1.26.7         IRanges_2.24.1              S4Vectors_0.28.1           
[17] BiocGenerics_0.36.1         MatrixGenerics_1.2.1        matrixStats_0.59.0         

loaded via a namespace (and not attached):
 [1] viridis_0.6.1                 httr_1.4.2                    BiocSingular_1.6.0           
 [4] viridisLite_0.4.0             bit64_4.0.5                   AnnotationHub_2.22.1         
 [7] DelayedMatrixStats_1.12.3     shiny_1.6.0                   assertthat_0.2.1             
[10] interactiveDisplayBase_1.28.0 BiocManager_1.30.15           BiocFileCache_1.14.0         
[13] blob_1.2.1                    vipor_0.4.5                   GenomeInfoDbData_1.2.4       
[16] yaml_2.2.1                    BiocVersion_3.12.0            pillar_1.6.1                 
[19] RSQLite_2.2.7                 lattice_0.20-44               beachmat_2.6.4               
[22] glue_1.4.2                    digest_0.6.27                 promises_1.2.0.1             
[25] XVector_0.30.0                colorspace_2.0-1              htmltools_0.5.1.1            
[28] httpuv_1.6.1                  pkgconfig_2.0.3               zlibbioc_1.36.0              
[31] purrr_0.3.4                   xtable_1.8-4                  scales_1.1.1                 
[34] later_1.2.0                   tibble_3.1.2                  generics_0.1.0               
[37] ellipsis_0.3.2                cachem_1.0.5                  withr_2.4.2                  
[40] magrittr_2.0.1                crayon_1.4.1                  mime_0.10                    
[43] memoise_2.0.0                 fansi_0.5.0                   beeswarm_0.4.0               
[46] tools_4.0.3                   lifecycle_1.0.0               Rhdf5lib_1.12.1              
[49] munsell_0.5.0                 irlba_2.3.3                   AnnotationDbi_1.52.0         
[52] compiler_4.0.3                rsvd_1.0.5                    rlang_0.4.11                 
[55] grid_4.0.3                    RCurl_1.98-1.3                BiocNeighbors_1.8.2          
[58] rhdf5filters_1.2.1            rappdirs_0.3.3                bitops_1.0-7                 
[61] ExperimentHub_1.16.1          gtable_0.3.0                  DBI_1.1.1                    
[64] curl_4.3.1                    R6_2.5.0                      gridExtra_2.3                
[67] dplyr_1.0.6                   fastmap_1.1.0                 bit_4.0.4                    
[70] utf8_1.2.1                    ggbeeswarm_0.6.0              Rcpp_1.0.6                   
[73] vctrs_0.3.8                   sparseMatrixStats_1.2.1       dbplyr_2.1.1                 
[76] tidyselect_1.1.1             

Bioconductor version '3.12'

  * 0 packages out-of-date
  * 1 packages too new

create a valid installation with

  BiocManager::install("harmony", update = TRUE, ask = FALSE)

more details: BiocManager::valid()$too_new, BiocManager::valid()$out_of_date

Warning message:
0 packages out-of-date; 1 packages too new
calculatePCA BPPARAM scater BiocParallel runPCA • 1.7k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 3 hours ago
The city by the bay

There was a bug in scater::runPCA relating to the handling of non-serial backends - see https://github.com/Alanocallaghan/scater/issues/148 for details. This has since been fixed in the latest version.

Note, though, that scater::runPCA's default algorithm choice of IRLBA is quite slow for file-backed matrices like those from the TENxPBMCData package. You'd be better off using BSPARAM=BiocSingular::RandomParam() instead.

ADD COMMENT
0
Entering edit mode

Thanks Aaron,

I used snowParam as a workaround for the time being.

Regarding file-backed matrices, I was under the impression that DelayedArray had it's own optimized matrix multiplication operator via DelayedMatrixStats. I will switch over to RandomParam, thanks for the advice and assistance.

-Dave

ADD REPLY

Login before adding your answer.

Traffic: 528 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6