How to incrementaly load the nzdata and nzindex from SparseArraySeed
0
0
Entering edit mode
Koki ▴ 10
@koki-7888
Last seen 14 days ago
Japan

I have implemented several functions based on DelayedArray and succeeded in cutting out the small block arrays and processing them in an out-of-core manner.

However, the computation time still remains an issue.

Next, I tried to use SparseArraySeed and using only non-zero elements (nzdata) and their indices (nzindex), because I thought that calculations involving zeros can be omitted, which is expected to speed up the process.

library("DelayedArray")

arr <- array(rbinom(2*3*4, 1, 0.5), dim=c(2,3,4))
sarr <- dense2sparse(darr)
sarr@nzdata
sarr@nzindex


However, the nzdata and nzindex are assumed to be on-memory, and it may not be possible to expand them all in memory for extremely large sparse arrays.

Do you know of a way to extract these nzdata and nzindexes sequentially from a file and use them?

According to the documentation of DelayedArray, HDF5Array, and TileDBArray, the as.sparse option of writeHDF5Array simply sets the sparse slot to the flag TRUE and the data is actually stored as dense format, but the as.sparse option of writeTileDBArray actually stores the data in sparse format in TileDB. So, I am thinking that using TileDBArray may solve this problem.

Koki

TileDBArray HDF5Array DelayedArray • 141 views
0
Entering edit mode

Sorry, I realized that I can set as.sparse=TRUE in read_block and use @nzindex, @nzdata in the returned SparseArraySeed.