I implemented some functions based on block processing, referring to the write_block() documentation.
https://rdrr.io/bioc/DelayedArray/man/write_block.html
I can now control the memory usage of my function.
However, when the data is large, the process of write_block() itself occupies many of the computations and becomes slow.
Using HDF5Array::setHDF5DumpCompressionLevel(0L) to write the data uncompressed,
I found that the calculation time was improved considerably but I still want to make it a little faster.
Do you know of any tips related to speeding up write_block() yet?
I'm currently thinking of the following right now though.
1. Sparse Array:
I thought I could speed up the process by using sparse multi-dimensional arrays that only handle non-zero values.
However, although sparse matrix is defined in the Matrix package, I don't know any good implementation of handling sparse multi-dimensional arrays in R.
Also, I found that sparse2dense() is performed in in write_block().
Does this mean that even if I use SparseArraySeed or as.sparse=TRUE, it will be forced to convert to a dense array when writing to the HDF5 file?
2. Chunk Size:
To begin with, does the "chunk_dim" option in writeHDF5Array mean the same thing as "chunk size" in HDF5 files?
https://support.hdfgroup.org/HDF5/doc/Advanced/Chunking/
I found that the chunk_dim option is automatically set by getHDF5DumpChunkDim() and seems to be specified by the dimension of the block extracted as the viewport.
Am I understanding this correctly?
https://bioc.ism.ac.jp/packages/3.7/bioc/manuals/HDF5Array/man/HDF5Array.pdf
Also, is there any possibility that adjusting the value of this chunk_dim will improve the speed of write_block()?
3. Parallel Computing:
I found that the functions blockApply() and blockReduce() are implemented in DelayedArray, which is BiocParallel based.
Using these functions, can write_block() be faster?
Does this mean that both the computations of the functions against each block and the writing of the computational results to the HDF5 file will be performed in a multi-process?
I think parallel I/O in HDF5 is still a difficult problem to solve though.
Besides, does it mean that parallel computing with 2 processes will result in less than 2 times the computation time, but 2 times the memory usage?
Best,
Koki
Can someone please respond to this when you have time?