How to save `SingleCellExperiment` with `DelayedArray` assay(s) in a portable format?
1
2
Entering edit mode
merv ▴ 140
@mmfansler-13248
Last seen 9 weeks ago
MSKCC | New York, NY

Here is a toy SingleCellExperiment object with a counts assay using an HDF5 realization:

library(Matrix)
library(HDF5Array)
library(SingleCellExperiment)
library(magrittr)

cts <- writeHDF5Array(rsparsematrix(10,10,0.1), "test.h5", 
                      as.sparse=TRUE, with.dimnames=TRUE)

sce <- SingleCellExperiment(assays=list(counts=cts))

saveRDS(sce, "test.rds")

If I relocate these two files, test.h5 and test.rds, say to a subfolder, foo, then the RDS object still loads fine, but the counts assay can no longer find the realization, since it appears to have recorded an absolute path. For example, I see this behavior:

sce_reload <- readRDS("foo/test.rds")
counts(sce_reload)
## Error in value[[3L]](cond) : 
##  'assay(<SingleCellExperiment>, i="character", ...)' invalid subscript 'i'
## H5Fis_hdf5() returned an error

Examining, the object, I can see that sce_reload@assays@data@listData$counts@seed@filepath points to the original full path.

Is there are way to save a SingleCellExperiment object with a DelayedArray assays that is portable? i.e., one can share or move the objects and still be loadable?

HDF5Array SingleCellExperiment DelayedArray • 3.4k views
ADD COMMENT
3
Entering edit mode
merv ▴ 140
@mmfansler-13248
Last seen 9 weeks ago
MSKCC | New York, NY

The saveHDF5SummarizedExperiment and loadHDF5SummarizedExperiment methods from HDF5Array appear to do some work in the background to make this possible. That is, we can do:

cts <- rsparsematrix(10,10,0.1)

sce <- SingleCellExperiment(assays=list(counts=cts))

saveHDF5SummarizedExperiment(sce, ".", prefix="test_")

which writes out test_se.rds and test_assays.h5 to the current directory. Moving these to a foo folder, we then can do:

sce_reload <- loadHDF5SummarizedExperiment("foo", prefix="test_")

counts(sce_reload)
## <10 x 10> sparse matrix of class HDF5Matrix and type "double":
##        [,1]  [,2]  [,3] ...  [,9] [,10]
##  [1,]  0.00  0.00  0.63   .  0.00  0.72
##  [2,]  0.00  0.00  0.00   .  0.00  0.00
##  [3,]  0.00  0.00  0.00   .  0.00  0.00
##  [4,]  0.00  0.00  0.00   .  0.00  0.00
##  [5,]  0.00  0.00  0.00   .  0.00  0.00
##  [6,]  0.00  0.00  0.00   .  0.00  0.00
##  [7,]  0.00  0.00 -1.10   .  0.00  0.00
##  [8,]  0.00  0.00  0.00   .  0.00 -0.44
##  [9,]  0.00  0.00  0.00   .  0.00  0.00
## [10,]  0.00  0.00  0.00   .  0.00  0.00

Looking at the code, these methods are manually editing the filepath in the assay before writing to not be absolute, and then when loading, converts it back to absolute using the location of the RDS file.

ADD COMMENT

Login before adding your answer.

Traffic: 554 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6