Hi all, so I have assembled a rather large single cell dataset that I am working on analyzing right now. The dataset consists of 15 samples, spread across 5 different treatment conditions (N = 3 samples / condition). After QC, etc I have a whopping 450,000 cells to play with. The ultimate goal is to identify small cell subtypes and perform pseudobulking to determine gene expression differences between the treatments.
I have opted to use BioC for the analysis, and my SingleCellExperiment Object is ~23 Gb including dimensional reduction etc.
I am currently attempting to perform kNN clustering, but when I run the clusterCells() function, it is consuming an absurd amount of memory (I attempted to run on a HPC cluster and got kicked off wen I used >450 Gb of RAM...)
As someone who is relatively new to this analysis, is it expected that this much memory would be used during steps like clustering, or pseudobulking? Or if not, are there any hints as to why the function may be causing this issue? here is the code:
clusterCells(SCE_all, use.dimred="PCA", BLUSPARAM=NNGraphParam(k=30, cluster.fun = "louvain"))
And here is the info for my SCE:
class: SingleCellExperiment
dim: 30711 470269
metadata(0):
assays(2): counts logcounts
rownames(30711): Tbx2 Tbx4 ... Unknown Unknown
rowData names(0):
colnames: NULL
colData names(11): Sample Barcode ... TreatmentGroup sizeFactor
reducedDimNames(1): PCA
mainExpName: NULL
altExpNames(0):
Happy to provide any other code, I just figured asking the question more generally first would be helpful. Thanks :)