I have implemented several functions based on DelayedArray
and succeeded in cutting out the small block arrays and processing them in an out-of-core manner.
However, the computation time still remains an issue.
Next, I tried to use SparseArraySeed
and using only non-zero elements (nzdata
) and their indices (nzindex
), because I thought that calculations involving zeros can be omitted, which is expected to speed up the process.
library("DelayedArray")
arr <- array(rbinom(2*3*4, 1, 0.5), dim=c(2,3,4))
sarr <- dense2sparse(darr)
sarr@nzdata
sarr@nzindex
However, the nzdata
and nzindex
are assumed to be on-memory, and it may not be possible to expand them all in memory for extremely large sparse arrays.
Do you know of a way to extract these nzdata
and nzindexes
sequentially from a file and use them?
According to the documentation of DelayedArray
, HDF5Array
, and TileDBArray
, the as.sparse
option of writeHDF5Array
simply sets the sparse slot to the flag TRUE and the data is actually stored as dense format, but the as.sparse
option of writeTileDBArray
actually stores the data in sparse format in TileDB
. So, I am thinking that using TileDBArray
may solve this problem.
Thank you in advance.
Koki
Sorry, I realized that I can set
as.sparse=TRUE
inread_block
and use@nzindex
,@nzdata
in the returnedSparseArraySeed
.