Reduce single cell experiment size to use as reference for annotation with SingleR
1
0
Entering edit mode
elgomez • 0
@elgomez-17056
Last seen 4 months ago
United States

Hi all,

I am trying to annotate a single cell dataset using a custom reference with SingleR.

I have previously successfully annotated this dataset using SingleR with other custom references but they were smaller (~1GB).

My dataset is ~15Gb and the dataset I would like to use as a reference (chimpanzee middle temporal gyrus from here: https://cellxgene.cziscience.com/collections/4dca242c-d302-4dba-a68f-4c61e7bad553) is ~12Gb.

When I try to run the SingleR command, R crashes even with 208Gb memory allocated on my university's server.

Since the reference dataset I want to use is still a normal size for a single cell experiment, I am thinking others may have encountered a similar issue.

Does anyone have a solution? Thanks!! Elaine ps I am not putting my code in since I'm not getting an error.

Code should be placed in three backticks as shown below

#reading in my dataset (single cell experiment object)
PFC.merged.sce <- readRDS("sce_chimp_PFC.RDS")

#reading in the data I want to use as a reference with SingleR
chimp_MTG = readRDS("sc-data/Jorstad/seur_chimp_MTG_Jorstad.rds")

#convert from Seurat to SCE object
seuMTG.chimp.sce <- as.SingleCellExperiment(chimp_MTG)

#to free memory going to remove seurat object:
rm(chimp_MTG)

#converting ensembl IDs to gene symbol to match my dataset
require(EnsDb.Hsapiens.v86)
geneids <- mapIds(EnsDb.Hsapiens.v86,
              keys = rownames(seuMTG.chimp.sce),
              column = 'SYMBOL',
              keytype = 'GENEID')
all(rownames(seuMTG.chimp.sce) == names(geneids))
keep <- !is.na(geneids)
geneids <- geneids[keep]
seuMTG.chimp.sce <- seuMTG.chimp.sce[keep,]
rownames(seuMTG.chimp.sce) <- geneids

#remove intermediate objects from environment:
rm(geneids)
rm(keep)

#SingleR command
pfc.chimp.mtg.pred <- SingleR(test = PFC.merged.sce, ref = seuMTG.chimp.sce, labels = seuMTG.chimp.sce@colData$Cluster, de.method = "wilcox")

#This is where after a few hours it crashes


sessionInfo( )
R version 4.2.2 (2022-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] EnsDb.Hsapiens.v86_2.99.0      ensembldb_2.22.0              
 [3] AnnotationFilter_1.22.0        GenomicFeatures_1.50.3        
 [5] AnnotationDbi_1.60.0           pheatmap_1.0.12               
 [7] scran_1.26.2                   tidySingleCellExperiment_1.8.2
 [9] ttservice_0.4.0                scCustomize_2.0.1             
[11] enrichR_3.2                    ggpubr_0.6.0                  
[13] scuttle_1.8.4                  reshape2_1.4.4                
[15] scRNAseq_2.12.0                SingleCellExperiment_1.20.1   
[17] lubridate_1.9.3                forcats_1.0.0                 
[19] stringr_1.5.0                  dplyr_1.1.4                   
[21] purrr_1.0.1                    readr_2.1.4                   
[23] tidyr_1.3.0                    tibble_3.2.1                  
[25] ggplot2_3.4.4                  tidyverse_2.0.0               
[27] SingleR_2.0.0                  SummarizedExperiment_1.28.0   
[29] Biobase_2.58.0                 GenomicRanges_1.50.2          
[31] GenomeInfoDb_1.34.6            IRanges_2.32.0                
[33] S4Vectors_0.36.1               BiocGenerics_0.44.0           
[35] MatrixGenerics_1.10.0          matrixStats_0.63.0            
[37] Seurat_5.0.0                   SeuratObject_5.0.1            
[39] sp_2.1-1                      

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.3                ggprism_1.0.4                 rtracklayer_1.58.0           
  [4] scattermore_1.2               bit64_4.0.5                   irlba_2.3.5.1                
  [7] DelayedArray_0.24.0           data.table_1.14.6             KEGGREST_1.38.0              
 [10] RCurl_1.98-1.9                generics_0.1.3                ScaledMatrix_1.6.0           
 [13] cowplot_1.1.1                 RSQLite_2.2.20                RANN_2.6.1                   
 [16] future_1.33.0                 bit_4.0.5                     tzdb_0.3.0                   
 [19] spatstat.data_3.0-3           xml2_1.3.3                    httpuv_1.6.8                 
 [22] assertthat_0.2.1              hms_1.1.2                     promises_1.2.0.1             
 [25] fansi_1.0.3                   restfulr_0.0.15               progress_1.2.2               
 [28] dbplyr_2.3.0                  igraph_1.5.1                  DBI_1.1.3                    
 [31] htmlwidgets_1.6.1             spatstat.geom_3.2-7           paletteer_1.5.0              
 [34] ellipsis_0.3.2                RSpectra_0.16-1               backports_1.4.1              
 [37] biomaRt_2.54.0                deldir_1.0-9                  sparseMatrixStats_1.10.0     
 [40] vctrs_0.6.4                   remotes_2.4.2.1               ROCR_1.0-11                  
 [43] abind_1.4-5                   cachem_1.0.6                  withr_2.5.0                  
 [46] progressr_0.14.0              vroom_1.6.1                   sctransform_0.4.1            
 [49] GenomicAlignments_1.34.0      prettyunits_1.1.1             goftest_1.2-3                
 [52] cluster_2.1.4                 ExperimentHub_2.6.0           dotCall64_1.1-0              
 [55] lazyeval_0.2.2                crayon_1.5.2                  spatstat.explore_3.2-5       
 [58] edgeR_3.40.2                  pkgconfig_2.0.3               nlme_3.1-161                 
 [61] vipor_0.4.5                   ProtGenerics_1.30.0           rlang_1.1.2                  
 [64] globals_0.16.2                lifecycle_1.0.3               miniUI_0.1.1.1               
 [67] filelock_1.0.2                fastDummies_1.7.3             BiocFileCache_2.6.0          
 [70] rsvd_1.0.5                    AnnotationHub_3.6.0           ggrastr_1.0.2                
 [73] polyclip_1.10-6               RcppHNSW_0.5.0                lmtest_0.9-40                
 [76] Matrix_1.6-3                  carData_3.0-5                 zoo_1.8-11                   
 [79] beeswarm_0.4.0                ggridges_0.5.4                GlobalOptions_0.1.2          
 [82] png_0.1-8                     viridisLite_0.4.1             rjson_0.2.21                 
 [85] bitops_1.0-7                  KernSmooth_2.23-20            spam_2.10-0                  
 [88] Biostrings_2.66.0             blob_1.2.3                    DelayedMatrixStats_1.20.0    
 [91] shape_1.4.6                   parallelly_1.36.0             spatstat.random_3.2-1        
 [94] rstatix_0.7.2                 ggsignif_0.6.4                beachmat_2.14.2              
 [97] scales_1.2.1                  memoise_2.0.1                 magrittr_2.0.3               
[100] plyr_1.8.8                    ica_1.0-3                     zlibbioc_1.44.0              
[103] compiler_4.2.2                dqrng_0.3.1                   BiocIO_1.8.0                 
[106] RColorBrewer_1.1-3            fitdistrplus_1.1-11           snakecase_0.11.1             
[109] Rsamtools_2.14.0              cli_3.6.0                     XVector_0.38.0               
[112] listenv_0.9.0                 patchwork_1.1.3               pbapply_1.7-2                
[115] MASS_7.3-58.1                 tidyselect_1.2.0              stringi_1.7.12               
[118] yaml_2.3.6                    locfit_1.5-9.7                BiocSingular_1.14.0          
[121] ggrepel_0.9.4                 grid_4.2.2                    tools_4.2.2                  
[124] timechange_0.2.0              future.apply_1.11.0           parallel_4.2.2               
[127] circlize_0.4.15               rstudioapi_0.14               bluster_1.8.0                
[130] metapod_1.6.0                 janitor_2.2.0                 gridExtra_2.3                
[133] Rtsne_0.16                    digest_0.6.31                 BiocManager_1.30.19          
[136] shiny_1.7.4                   Rcpp_1.0.9                    car_3.1-2                    
[139] broom_1.0.5                   BiocVersion_3.16.0            later_1.3.0                  
[142] RcppAnnoy_0.0.21              WriteXLS_6.4.0                httr_1.4.7                   
[145] colorspace_2.1-0              XML_3.99-0.13                 tensor_1.5                   
[148] reticulate_1.34.0             splines_4.2.2                 statmod_1.5.0                
[151] uwot_0.1.16                   rematch2_2.1.2                spatstat.utils_3.0-4         
[154] plotly_4.10.3                 xtable_1.8-4                  jsonlite_1.8.4               
[157] R6_2.5.1                      pillar_1.9.0                  htmltools_0.5.4              
[160] mime_0.12                     glue_1.6.2                    fastmap_1.1.0                
[163] BiocParallel_1.32.5           BiocNeighbors_1.16.0          interactiveDisplayBase_1.36.0
[166] codetools_0.2-18              utf8_1.2.2                    lattice_0.20-45              
[169] spatstat.sparse_3.0-3         curl_5.1.0                    ggbeeswarm_0.7.2             
[172] leiden_0.4.3.1                limma_3.54.0                  survival_3.5-0               
[175] munsell_0.5.0                 GenomeInfoDbData_1.2.9        gtable_0.3.1
SingleR SingleCellExperiment • 928 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 13 hours ago
United States

The celldex experiment you are using has over 112k cells. You might try subsetting the columns of the data to include a reasonable number of cells of each type. You could also subset the rows based on the distribution of the row variances. Genes that don't really change expression won't be helpful anyway.

0
Entering edit mode

Thank you, James!

I ended up running seurat FindVariableFeatures with nfeatures = 10000 and then subsetting the object to just those variably expressed genes and it worked!

chimp_MTG <- FindVariableFeatures(chimp_MTG, selection.method = "vst", nfeatures = 10000)
var_chimp_MTG <- subset(chimp_MTG, features = VariableFeatures(object = chimp_MTG))
varMTG.chimp.sce <- as.SingleCellExperiment(var_chimp_MTG)

Thanks! Elaine

ADD REPLY

Login before adding your answer.

Traffic: 605 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6