I am using DelayedArray and HDF5Array inside my package, and some of my functions use getAutoBlockSize internally to get the block size so that it does not exceed the block size specified by setAutoBlockSize, and then use HDF5Array::write_block as documented, each calculation is done sequentially.
https://rdrr.io/bioc/DelayedArray/man/write_block.html
By the way, is it safe to assume that all the functions implemented in DelayedArray and HDF5Array are basically block size aware?
For example, the following functions are used in my functions, and I haven't written any code of block process in the explicit, but can I assume that these recognize the block size and process it sequentially?
I couldn't find any place in the code where getAutoBlockSize is written explicitly though.
DelayedArray::apermDelayedArray::realizeHDF5Array::writeHDF5ArrayHDF5Array::ReshapedHDF5Array
Also, I would like to know if there is a way to find out if a source code is block size aware or not. If there is the list somewhere, it would be helpful.
Koki

Very interesting.
So, if I run a delayed operation and don't perform the actual calculation, but simply stack the calculation, and then
realizethe calculation after that, can I assume that the writes to the file (e.g. HDF5) that are required during the calculation are block size aware?For example, the following codes use a combination of delayed operation and block-processed operation but the code as a whole does not exceed the block size, is that correct?
Also, I think that even a simple delayed operation can cause a memory error (e.g., Error: C stack usage of HDF5Array) and does it mean that there is not enough memory to stack the calculation?
Koki
Yes, that's correct. More precisely:
realize(x, "HDF5Array")just callsas(x, "HDF5Array"), which just callswriteHDF5Array(x), so the three are equivalent. The workhorse behindwriteHDF5Array(x)isDelayedArray::BLOCK_write_to_sink()(this is an internal helper so is not documented). As its name suggestsDelayedArray::BLOCK_write_to_sink()is block-size aware i.e. it will define a grid of blocks onxthat respectsgetAutoBlockSize(), walk on the blocks of that grid, and realize each block before writing them to disk.Note however that choosing blocks that respect
getAutoBlockSize()isn't a guarantee that the code won't use more memory than the block size. This is a common misconception. See the last paragraph of the Details section in?getAutoBlockSizefor more information about this.Well, not a simple delayed operation. You need to stack tens of thousands of delayed operations on an object to end up with a "C stack usage is too close to the limit" problem. This typically happens when you apply a delayed operation in a loop which is almost never a good idea.
H.
Ok, I got the gist of it.
If I have the same situation of the previous case (Error: C stack usage of HDF5Array), where I have to do delayed operations repeatedly, I'd better perform
realizeoften to avoid the "C stack usage is too close to the limit" error.Thanks a lot.