Combining multiple MTX (matrix market) files to HDF5 Array format
0
0
Entering edit mode
merv ▴ 140
@mmfansler-13248
Last seen 5 weeks ago
MSKCC | New York, NY

I have multiple MTX files output from a single-cell kallisto | bustools pipeline that I would like to filter and combine into a single SingleCellExperiment. Previously, I have done this with Matrix::readMM and then convert to a sparse matrix format (CsparseMatrix). I've started hitting scalability issues with this approach and would like to switch to a DelayedArray implementation (like HDF5Array) for the counts assay.

Is there a function to load such files directly as DelayedArray objects?

Is there a recommended way to serially build up an HDF5 array from multiple samples?


Current Strategy

I have not been able to find such a method, and I am struggling to find documentation on creating HDF5 arrays from scratch. So far, I'm still loading to CpsarseMatrix, filtering, and then saving out each using writeTENxMatrix. Roughly, something like:

library(Matrix)
library(HDF5Array)
library(SingleCellExperiment)
library(magrittr)

## example of input files once loaded as `CsparseMatrix` objects and filtered
cts1 <- rsparsematrix(10, 10, 0.1)
cts2 <- rsparsematrix(10, 10, 0.1)

## dump each out to HDF5 (temp)
cts1 <- writeTENxMatrix(cts1)
cts2 <- writeTENxMatrix(cts2)

## create SCEs and combine
sce <- list(cts1, cts2) %>%
  lapply(function (x) { SingleCellExperiment(assays=list(counts=x)) }) %>%
  do.call(what=cbind)

## dump out combined HDF5
cts_all <- writeHDF5Array(counts(sce), "test.h5", 
                          as.sparse=TRUE, verbose=TRUE, with.dimnames=TRUE)

## recreate SCE with unified counts
sce <- SingleCellExperiment(assays=list(counts=cts_all), colData=colData(sce))

## save final SCE
saveRDS(sce, "test.Rds")

Any pointers to better working with this format would be appreciated. The above seems convoluted. Ultimately, I want one HDF5 and one RDS file that are portable and contain all the data.

HDF5Array SingleCellExperiment DelayedArray • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 911 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6