There may be much smarter ways to do this (I'm taking the over!), but what I have done in the past is to create a HDF5 file, that has pre-specified dimensions, and then sticking matrices of the data in, one at a time. So something like
library(HDF5Array)
fn <- <file name goes here>
h5createFile(fn)
h5createDataset(fn, "somename", c(NCOL, NROW), <other args go here>)
for(i in 1:num_matrices){
mat <- <make a matrix of your data, or read it in or whatever>
h5write(mat, fn, "somename", FALSE, index = list(<rows the data are going into>, <columns the data are going into>))
}
H5close()
You might want to specify what sort of data you will be adding, using the storage.mode argument for h5createDataset
, and the chunk sizes you want to be able to read back out. Then the only trick is to get the row and column counting right.
I used this general idea for a collaborator who wanted a SummarizedExperiment
, where for any common biallelic SNP (like 8 million or so? I forget), and any tissue from the GTEx consortium, there would be a boolean saying if the SNP overlapped a given eQTL. Since I couldn't create a matrix that large in R, I just generated in chunks and put into the HDF5 file, then wrapped in the SummarizedExperiment
.
In
hcreateDataset
, the third argument is c(NROW, NCOL) for the total number of rows and columns you will need. Not the opposite as I showed in my pseudo-codeAs an actual example
This the way I'd do it too. I'd only add a couple of things:
H5close()
. I think it can be a bit heavy handed and under some circumstances will close the underlying HDF5 library, which then breaks the R/HDF5 interface and results in weird errors that are hard to debug. In this code example you shouldn't have to use anything ash5write()
will tidy up after itself, but there's alsoh5closeAll()
if you want to be certain all files have been flushed and closed.Thanks for the advice about
H5close
. I was under the impression that you had to tell the HDF5 library to 'finish things off', but ifh5write
already pretty much handles that, all the better.