Search
Question: Sparse matrices in Bioconductor objects for single-cell analyses
0
gravatar for sandmann.t
16 months ago by
sandmann.t40
sandmann.t40 wrote:

Dear Bioconductors, 

more and more publications include large single-cell RNA-seq datasets. For example, Keren-Shaul et al made count matrices with 34016 gene x 37248 samples (= cells) available on NCBI GEO. I am interested in using Bioconductor to analyze such data and was happy to find the single-cell analysis Bioconductor workflow and the excellent scater package.

The single-cell data are very sparse, with up to 90% of zero counts and can be very efficiently stored in sparse matrices, e.g. using the Matrix package. Yet, it seems that while scater's newSCESet function accepts a sparse matrix, it coerces it into a regular matrix right away, which requires much more memory (and storage space).

I can simply find a machine with lots of RAM or apply abundance filters before creating an SCESet, but I am curious: are there ways to use Bioconductor's infrastructure and take advantage of the sparse nature of the data? 

And in case the scater authors are listening: do you have any plans to take use sparse matrices?

Any recommendations are appreciated.

Thanks,

Thomas

ADD COMMENTlink modified 16 months ago by Aaron Lun21k • written 16 months ago by sandmann.t40
3
gravatar for Aaron Lun
16 months ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

Well, this question might as well have my name written on it.

Yes, we are planning to adapt the SCESet object to accept sparse matrices; see https://github.com/drisso/SingleCellExperiment for a class based on SummarizedExperiment (which happily takes sparse matrices, as well as disk-based representations such as instances of the HDF5Array class).

Most of the R code should then work interchangeably with dense, sparse and file-backed matrices. The C++ code will take some more effort but we have written an API (https://github.com/LTLA/beachmat) which allows package code to be written in a manner that is agnostic to the exact representation of the matrix.

All of these goodies should hopefully come out in the next release, sometime in September.

ADD COMMENTlink modified 16 months ago • written 16 months ago by Aaron Lun21k

That's great! Thanks a lot for sharing your progress, plans and especially the pointer to the SingleCellExperiment package. I am not surprised that you have even more awesome solutions in the works :-)

ADD REPLYlink written 16 months ago by sandmann.t40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 140 users visited in the last hour