Search
Question: Sparse matrices in Bioconductor objects for single-cell analyses
0
gravatar for sandmann.t
5 months ago by
sandmann.t20
sandmann.t20 wrote:

Dear Bioconductors, 

more and more publications include large single-cell RNA-seq datasets. For example, Keren-Shaul et al made count matrices with 34016 gene x 37248 samples (= cells) available on NCBI GEO. I am interested in using Bioconductor to analyze such data and was happy to find the single-cell analysis Bioconductor workflow and the excellent scater package.

The single-cell data are very sparse, with up to 90% of zero counts and can be very efficiently stored in sparse matrices, e.g. using the Matrix package. Yet, it seems that while scater's newSCESet function accepts a sparse matrix, it coerces it into a regular matrix right away, which requires much more memory (and storage space).

I can simply find a machine with lots of RAM or apply abundance filters before creating an SCESet, but I am curious: are there ways to use Bioconductor's infrastructure and take advantage of the sparse nature of the data? 

And in case the scater authors are listening: do you have any plans to take use sparse matrices?

Any recommendations are appreciated.

Thanks,

Thomas

ADD COMMENTlink modified 5 months ago by Aaron Lun17k • written 5 months ago by sandmann.t20
3
gravatar for Aaron Lun
5 months ago by
Aaron Lun17k
Cambridge, United Kingdom
Aaron Lun17k wrote:

Well, this question might as well have my name written on it.

Yes, we are planning to adapt the SCESet object to accept sparse matrices; see https://github.com/drisso/SingleCellExperiment for a class based on SummarizedExperiment (which happily takes sparse matrices, as well as disk-based representations such as instances of the HDF5Array class).

Most of the R code should then work interchangeably with dense, sparse and file-backed matrices. The C++ code will take some more effort but we have written an API (https://github.com/LTLA/beachmat) which allows package code to be written in a manner that is agnostic to the exact representation of the matrix.

All of these goodies should hopefully come out in the next release, sometime in September.

ADD COMMENTlink modified 5 months ago • written 5 months ago by Aaron Lun17k

That's great! Thanks a lot for sharing your progress, plans and especially the pointer to the SingleCellExperiment package. I am not surprised that you have even more awesome solutions in the works :-)

ADD REPLYlink written 5 months ago by sandmann.t20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 152 users visited in the last hour