Hi developer,
I have a dataset with more than 30,000,000 cells * 30 genes for multiple samples. I want to use runTSNE or calculateTSNE function to reduce dimension. May I please know a feasible way? Much appreciated!
Hi developer,
I have a dataset with more than 30,000,000 cells * 30 genes for multiple samples. I want to use runTSNE or calculateTSNE function to reduce dimension. May I please know a feasible way? Much appreciated!
You can use snifter
, which has a wrapper in scater
. See the example code below. I think UMAP also scales pretty well with the number of cells. Either is likely to take several hours.
Be aware that I have 32GB of RAM on my local machine and this runs out of memory very quickly. It's been running a few hours on the cluster, not sure how long it'll take in total.
library("scater")
library("SingleCellExperiment")
library("DelayedArray")
mat <- DelayedArray(matrix(rnorm(30*30000000), nrow = 30))
sce <- SingleCellExperiment(assays = list(logcounts=mat))
sce <- runTSNE(sce, use_fitsne=TRUE)
sce <- runUMAP(sce)
alan.ocallaghan Sounds like a feasible plan for me. Thanks! For runTSNE and runUMAP, they supply the "num_threads" parameter, do you think if I offer more threads, will the computation be faster?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Do you truly mean 30 million cells? That is orders of magnitude larger than any scRNA-seq dataset I have seen. I've never seen a t-SNE plot for more than about 200,000 cells at a time.
Gordon Smyth Yes, it is about 30million cells, which comes from more than 100 samples. Not scRNA-seq, it comes from mass cytometry.
An alternative would be to cluster your cells first and then downsample your cells by stratifying per cluster.