Dear Bioconductors,

more and more publications include large single-cell RNA-seq datasets. For example, Keren-Shaul et al made count matrices with 34016 gene x 37248 samples (= cells) available on NCBI GEO. I am interested in using Bioconductor to analyze such data and was happy to find the single-cell analysis Bioconductor workflow and the excellent `scater`

package.

The single-cell data are very sparse, with up to 90% of zero counts and can be very efficiently stored in sparse matrices, e.g. using the Matrix package. Yet, it seems that while `scater's`

`newSCESet`

function accepts a sparse matrix, it coerces it into a regular matrix right away, which requires much more memory (and storage space).

I can simply find a machine with lots of RAM or apply abundance filters before creating an SCESet, but I am curious: are there ways to use Bioconductor's infrastructure and take advantage of the sparse nature of the data?

And in case the `scater`

authors are listening: do you have any plans to take use sparse matrices?

Any recommendations are appreciated.

Thanks,

Thomas

That's great! Thanks a lot for sharing your progress, plans and especially the pointer to the SingleCellExperiment package. I am not surprised that you have even more awesome solutions in the works :-)