converting Seurat V5 with BPcells matrix to SingleCellExperiment
1
0
Entering edit mode
Le • 0
@bf1730d5
Last seen 3 months ago
United States

Hi, I'm trying to convert a pretty big merged Seurat V5 object (30k features x 800k cells) with only raw counts and metadata to a SingleCellExperiment object. I have tried a few different things but all had problems:

1). Right after merging 100 small Seurat objects, I tried to directly JoinLayers but ran into probably memory issue. I'm using 300Gb for this.

merged_obj<-merge(x=obj_list[[1]],y=obj_list[-1],merge.data = F)
obj_merged<-JoinLayers(obj_merged)

Error in eval(expr, envir, enclos): vector::reserve
Traceback:

1. JoinLayers(obj_merged)
2. JoinLayers.Seurat(obj_merged)
3. JoinLayers(object = object[[assay]], layers = layers, new = new, 
 .     ...)
4. JoinLayers.Assay5(object = object[[assay]], layers = layers, 
 .     new = new, ...)
5. JoinSingleLayers(object = object, layers = layers[i], new = new[i], 
 .     default = TRUE, ...)
6. StitchMatrix(x = LayerData(object = object, layer = layers[1L]), 
 .     y = lapply(X = layers[2:length(x = layers)], FUN = LayerData, 
 .         object = object), rowmap = slot(object = object, name = "features")[, 
 .         layers], colmap = slot(object = object, name = "cells")[, 
 .         layers])
7. StitchMatrix.dgCMatrix(x = LayerData(object = object, layer = layers[1L]), 
 .     y = lapply(X = layers[2:length(x = layers)], FUN = LayerData, 
 .         object = object), rowmap = slot(object = object, name = "features")[, 
 .         layers], colmap = slot(object = object, name = "cells")[, 
 .         layers])
8. RowMergeSparseMatrices(mat1 = x, mat2 = y)
9. RowMergeMatricesList(mat_list = all.mat, mat_rownames = all.rownames, 
 .     all_rownames = all.names)

2). To circumvent memory problems, I used BPCells to write counts into disks first, JoinLayers, then convert merged matrix back to sparse matrix.

library(BPCells) 
for (i in 1:length(Layers(obj))){
write_matrix_dir(mat = obj[["RNA"]][i], dir = paste0("./working_dir/layer_",i))
 LayerData(obj, assay = "RNA", layer = paste0("counts.",i))<- open_matrix_dir(dir = paste0("./working_dir/layer_",i))
    print(i)
    }
obj<-JoinLayers(obj)
obj[["RNA"]]$counts<-as(obj[["RNA"]]$counts, "dgCMatrix")

Error in sprintf("Input matrix has %d entries", length(res@index)): invalid format '%d'; use format %f, %e, %g or %a for numeric objects
Traceback:

1. as(obj[["RNA"]]$counts, "dgCMatrix")
2. asMethod(object)
3. rlang::abort(c("Error converting IterableMatrix to dgCMatrix", 
 .     "dgCMatrix objects cannot hold more than 2^31 non-zero entries", 
 .     sprintf("Input matrix has %d entries", length(res@index))))
4. is_formula(message, scoped = TRUE, lhs = FALSE)
5. sprintf("Input matrix has %d entries", length(res@index))

3). I thought about converting each small seurat obj into SCE format first, then merge them together. However, the different features in each obj prevents simple cbind() operation. scMerge package does not support sparse matrix format for the union option, which is what Seurat merge does.

Does anyone have any alternatives? Really appreciate your help!

sessionInfo():
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /u/local/compilers/intel/oneapi/2022.1.1/mkl/2022.0.1/lib/intel64/libmkl_gf_lp64.so.2;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Los_Angeles
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] BPCells_0.2.0               scMerge_1.18.0             
 [3] dreamlet_1.1.19             SingleCellExperiment_1.22.0
 [5] SummarizedExperiment_1.32.0 Biobase_2.62.0             
 [7] GenomicRanges_1.54.1        GenomeInfoDb_1.38.8        
 [9] IRanges_2.36.0              S4Vectors_0.40.2           
[11] BiocGenerics_0.48.1         MatrixGenerics_1.14.0      
[13] matrixStats_1.3.0           variancePartition_1.33.11  
[15] BiocParallel_1.36.0         limma_3.58.1               
[17] ggplot2_3.5.1               Seurat_5.1.0               
[19] SeuratObject_5.0.2          sp_2.1-4                   

loaded via a namespace (and not attached):
  [1] GSEABase_1.64.0           progress_1.2.3           
  [3] nnet_7.3-18               goftest_1.2-3            
  [5] Biostrings_2.70.3         rstan_2.32.6             
  [7] vctrs_0.6.5               spatstat.random_3.2-3    
  [9] digest_0.6.35             png_0.1-8                
 [11] corpcor_1.6.10            ggrepel_0.9.5            
 [13] mixsqp_0.3-54             IRdisplay_1.1            
 [15] deldir_2.0-4              parallelly_1.37.1        
 [17] batchelor_1.16.0          MASS_7.3-58.4            
 [19] reshape2_1.4.4            SQUAREM_2021.1           
 [21] httpuv_1.6.15             withr_3.0.0              
 [23] xfun_0.45                 survival_3.5-5           
 [25] proxyC_0.4.1              memoise_2.0.1            
 [27] ggbeeswarm_0.7.2          zoo_1.8-12               
 [29] gtools_3.9.5              KEGGgraph_1.62.0         
 [31] V8_4.4.2                  pbapply_1.7-2            
 [33] DEoptimR_1.1-3            IRkernel_1.3.2.9000      
 [35] Formula_1.2-5             prettyunits_1.2.0        
 [37] KEGGREST_1.42.0           promises_1.3.0           
 [39] httr_1.4.7                cvTools_0.3.3            
 [41] globals_0.16.3            fitdistrplus_1.1-11      
 [43] ashr_2.2-63               rstudioapi_0.16.0        
 [45] miniUI_0.1.1.1            generics_0.1.3           
 [47] base64enc_0.1-3           babelgene_22.9           
 [49] curl_5.2.1                repr_1.1.7               
 [51] sfsmisc_1.1-18            zlibbioc_1.48.2          
 [53] ScaledMatrix_1.10.0       polyclip_1.10-6          
 [55] RcppZiggurat_0.1.6        GenomeInfoDbData_1.2.11  
 [57] SparseArray_1.2.4         xtable_1.8-4             
 [59] stringr_1.5.1             evaluate_0.24.0          
 [61] S4Arrays_1.2.1            Rfast_2.1.0              
 [63] hms_1.1.3                 irlba_2.3.5.1            
 [65] colorspace_2.1-0          ROCR_1.0-11              
 [67] reticulate_1.37.0         spatstat.data_3.0-4      
 [69] magrittr_2.0.3            lmtest_0.9-40            
 [71] Rgraphviz_2.46.0          viridis_0.6.5            
 [73] later_1.3.2               lattice_0.21-8           
 [75] robustbase_0.99-2         spatstat.geom_3.2-9      
 [77] future.apply_1.11.2       scuttle_1.12.0           
 [79] scattermore_1.2           XML_3.99-0.16.1          
 [81] cowplot_1.1.3             RcppAnnoy_0.0.22         
 [83] Hmisc_5.1-2               pillar_1.9.0             
 [85] StanHeaders_2.32.7        nlme_3.1-162             
 [87] iterators_1.0.14          caTools_1.18.2           
 [89] compiler_4.3.0            beachmat_2.18.1          
 [91] RSpectra_0.16-1           stringi_1.8.4            
 [93] rmeta_3.0                 tensor_1.5               
 [95] minqa_1.2.6               plyr_1.8.9               
 [97] scater_1.30.1             msigdbr_7.5.1            
 [99] crayon_1.5.2              abind_1.4-5              
[101] truncnorm_1.0-9           metadat_1.2-0            
[103] locfit_1.5-9.9            bit_4.0.5                
[105] mathjaxr_1.6-0            dplyr_1.1.4              
[107] codetools_0.2-19          BiocSingular_1.18.0      
[109] QuickJSR_1.1.3            plotly_4.10.4            
[111] remaCor_0.0.18            mime_0.12                
[113] splines_4.3.0             Rcpp_1.0.12              
[115] fastDummies_1.7.3         sparseMatrixStats_1.14.0 
[117] EnrichmentBrowser_2.32.0  knitr_1.47               
[119] blob_1.2.4                utf8_1.2.4               
[121] reldist_1.7-2             pbdZMQ_0.3-11            
[123] lme4_1.1-35.3             listenv_0.9.1            
[125] checkmate_2.3.1           DelayedMatrixStats_1.24.0
[127] Rdpack_2.6                pkgbuild_1.4.4           
[129] tibble_3.2.1              Matrix_1.6-5             
[131] statmod_1.5.0             startupmsg_0.9.6.1       
[133] fANCOVA_0.6-1             pkgconfig_2.0.3          
[135] tools_4.3.0               cachem_1.1.0             
[137] RhpcBLASctl_0.23-42       rbibutils_2.2.16         
[139] RSQLite_2.3.6             viridisLite_0.4.2        
[141] DBI_1.2.2                 numDeriv_2016.8-1.1      
[143] fastmap_1.2.0             rmarkdown_2.27           
[145] scales_1.3.0              grid_4.3.0               
[147] ica_1.0-3                 broom_1.0.5              
[149] patchwork_1.2.0           BiocManager_1.30.22      
[151] dotCall64_1.1-1           graph_1.80.0             
[153] zenith_1.5.3              RANN_2.6.1               
[155] rpart_4.1.19              aod_1.3.3                
[157] mgcv_1.8-42               foreign_0.8-84           
[159] cli_3.6.2                 purrr_1.0.2              
[161] leiden_0.4.3.1            lifecycle_1.0.4          
[163] mashr_0.2.79              uwot_0.2.2               
[165] M3Drop_1.28.8             mvtnorm_1.2-4            
[167] bluster_1.12.0            backports_1.4.1          
[169] distr_2.9.3               annotate_1.80.0          
[171] gtable_0.3.5              ggridges_0.5.6           
[173] metafor_4.6-0             densEstBayes_1.0-2.2     
[175] progressr_0.14.0          parallel_4.3.0           
[177] jsonlite_1.8.8            edgeR_4.0.16             
[179] RcppHNSW_0.6.0            bitops_1.0-7             
[181] bit64_4.0.5               assertthat_0.2.1         
[183] Rtsne_0.17                spatstat.utils_3.0-5     
[185] BiocNeighbors_1.20.2      RcppParallel_5.1.7       
[187] bdsmatrix_1.3-7           metapod_1.10.1           
[189] dqrng_0.4.1               loo_2.7.0                
[191] pbkrtest_0.5.2            lazyeval_0.2.2           
[193] shiny_1.8.1.1             ruv_0.9.7.1              
[195] htmltools_0.5.8           sctransform_0.4.1        
[197] glue_1.7.0                spam_2.10-0              
[199] ResidualMatrix_1.10.0     XVector_0.42.0           
[201] RCurl_1.98-1.14           scran_1.30.2             
[203] gridExtra_2.3             EnvStats_2.8.1           
[205] boot_1.3-28.1             igraph_2.0.3             
[207] invgamma_1.1              R6_2.5.1                 
[209] tidyr_1.3.1               gplots_3.1.3.1           
[211] cluster_2.1.4             bbmle_1.0.25.1           
[213] nloptr_2.0.3              rstantools_2.4.0.9000    
[215] DelayedArray_0.28.0       tidyselect_1.2.1         
[217] vipor_0.4.7               htmlTable_2.4.2          
[219] inline_0.3.19             AnnotationDbi_1.64.1     
[221] future_1.33.2             rsvd_1.0.5               
[223] munsell_0.5.1             KernSmooth_2.23-20       
[225] data.table_1.15.4         htmlwidgets_1.6.4        
[227] RColorBrewer_1.1-3        rlang_1.1.4              
[229] spatstat.sparse_3.0-3     spatstat.explore_3.2-7   
[231] lmerTest_3.1-3            uuid_1.2-0               
[233] fansi_1.0.6               beeswarm_0.4.0
SeuratV5 SingleCellExperiment BPCells SingleCellExper • 1.2k views
ADD COMMENT
0
Entering edit mode
yxsee • 0
@206ec564
Last seen 5 months ago
Singapore

You should write obj[["RNA"]][i]$counts to disk instead of the entire RNA assay.

for (i in 1:length(Layers(obj))){
write_matrix_dir(mat = obj[["RNA"]][i]$counts, dir = paste0("./working_dir/layer_",i))
obj[["RNA"]][i]$counts <- open_matrix_dir(dir = paste0("./working_dir/layer_",i))
    print(i)
    }
ADD COMMENT

Login before adding your answer.

Traffic: 507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6