Question: HDF5Array Single cell experiment objects - can on-disk files change?
0
5 weeks ago by
sarah.williams120 wrote:

Basic question about working with HDF5-backed / sparse/ delayedArray SingleCellExperiment objects:

Because its stored on disk, does that mean operations on a large dataset could get realised live on the currently loaded data on disk?

ie. if I do the following,

sce <- loadHDF5SummarizedExperiment("original_data/")
## make lots of changes to big dataset sce ##
## maybe something computationally nasty that can't be 'delayed'?
sce2 <- saveHDF5SummarizedExperiment(sce, file="altered_data")


Is 'original_data' guaranteed to be unchanged?

Thanks.

hdf5array • 57 views
modified 5 weeks ago by Aaron Lun23k • written 5 weeks ago by sarah.williams120
Answer: HDF5Array Single cell experiment objects - can on-disk files change?
2
5 weeks ago by
Aaron Lun23k
Cambridge, United Kingdom
Aaron Lun23k wrote:

Any non-delayed operations will be realized to the specified backend. By default, this is an ordinary matrix, represented fully in random access memory. If you do:

setRealizationBackend("HDF5Array")


... you will dump results into a HDF5Matrix. Results will be dumped into a file specified by:

getHDF5DumpFile()
## [1] "/tmp/RtmpFodSvr/HDF5Array_dump/auto00001.h5"


... which is not your original HDF5 file. This avoids problems from overwriting, which would be disastrous as it silently changes the data that might be actively pointed to by other HDF5Array objects.

FYI, to achieve the dangerous behaviour, one would need to do:

setHDF5DumpFile("original_data/name_of_hdf5.h5")


... but then one would be silly. I'm not even sure what would happen if the HDF5 C library is asked to read from and write to the same file at once, or even the same data set - you might see some very interesting behaviour here.

So, yes, original_data is guaranteed to be unchanged unless you actively tell it otherwise.