Question: Sparse matrices in Bioconductor objects for single-cell analyses
gravatar for sandmann.t
7 months ago by
sandmann.t20 wrote:

Dear Bioconductors, 

more and more publications include large single-cell RNA-seq datasets. For example, Keren-Shaul et al made count matrices with 34016 gene x 37248 samples (= cells) available on NCBI GEO. I am interested in using Bioconductor to analyze such data and was happy to find the single-cell analysis Bioconductor workflow and the excellent scater package.

The single-cell data are very sparse, with up to 90% of zero counts and can be very efficiently stored in sparse matrices, e.g. using the Matrix package. Yet, it seems that while scater's newSCESet function accepts a sparse matrix, it coerces it into a regular matrix right away, which requires much more memory (and storage space).

I can simply find a machine with lots of RAM or apply abundance filters before creating an SCESet, but I am curious: are there ways to use Bioconductor's infrastructure and take advantage of the sparse nature of the data? 

And in case the scater authors are listening: do you have any plans to take use sparse matrices?

Any recommendations are appreciated.



ADD COMMENTlink modified 7 months ago by Aaron Lun18k • written 7 months ago by sandmann.t20
gravatar for Aaron Lun
7 months ago by
Aaron Lun18k
Cambridge, United Kingdom
Aaron Lun18k wrote:

Well, this question might as well have my name written on it.

Yes, we are planning to adapt the SCESet object to accept sparse matrices; see for a class based on SummarizedExperiment (which happily takes sparse matrices, as well as disk-based representations such as instances of the HDF5Array class).

Most of the R code should then work interchangeably with dense, sparse and file-backed matrices. The C++ code will take some more effort but we have written an API ( which allows package code to be written in a manner that is agnostic to the exact representation of the matrix.

All of these goodies should hopefully come out in the next release, sometime in September.

ADD COMMENTlink modified 7 months ago • written 7 months ago by Aaron Lun18k

That's great! Thanks a lot for sharing your progress, plans and especially the pointer to the SingleCellExperiment package. I am not surprised that you have even more awesome solutions in the works :-)

ADD REPLYlink written 7 months ago by sandmann.t20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 108 users visited in the last hour