Question: Sparse matrices in Bioconductor objects for single-cell analyses
gravatar for sandmann.t
2.5 years ago by
sandmann.t50 wrote:

Dear Bioconductors, 

more and more publications include large single-cell RNA-seq datasets. For example, Keren-Shaul et al made count matrices with 34016 gene x 37248 samples (= cells) available on NCBI GEO. I am interested in using Bioconductor to analyze such data and was happy to find the single-cell analysis Bioconductor workflow and the excellent scater package.

The single-cell data are very sparse, with up to 90% of zero counts and can be very efficiently stored in sparse matrices, e.g. using the Matrix package. Yet, it seems that while scater's newSCESet function accepts a sparse matrix, it coerces it into a regular matrix right away, which requires much more memory (and storage space).

I can simply find a machine with lots of RAM or apply abundance filters before creating an SCESet, but I am curious: are there ways to use Bioconductor's infrastructure and take advantage of the sparse nature of the data? 

And in case the scater authors are listening: do you have any plans to take use sparse matrices?

Any recommendations are appreciated.



ADD COMMENTlink modified 2.5 years ago by Aaron Lun25k • written 2.5 years ago by sandmann.t50
Answer: Sparse matrices in Bioconductor objects for single-cell analyses
gravatar for Aaron Lun
2.5 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

Well, this question might as well have my name written on it.

Yes, we are planning to adapt the SCESet object to accept sparse matrices; see for a class based on SummarizedExperiment (which happily takes sparse matrices, as well as disk-based representations such as instances of the HDF5Array class).

Most of the R code should then work interchangeably with dense, sparse and file-backed matrices. The C++ code will take some more effort but we have written an API ( which allows package code to be written in a manner that is agnostic to the exact representation of the matrix.

All of these goodies should hopefully come out in the next release, sometime in September.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by Aaron Lun25k

That's great! Thanks a lot for sharing your progress, plans and especially the pointer to the SingleCellExperiment package. I am not surprised that you have even more awesome solutions in the works :-)

ADD REPLYlink written 2.5 years ago by sandmann.t50
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 215 users visited in the last hour