I am using DelayedArray
and HDF5Array
inside my package, and some of my functions use getAutoBlockSize
internally to get the block size so that it does not exceed the block size specified by setAutoBlockSize
, and then use HDF5Array::write_block
as documented, each calculation is done sequentially.
https://rdrr.io/bioc/DelayedArray/man/write_block.html
By the way, is it safe to assume that all the functions implemented in DelayedArray
and HDF5Array
are basically block size aware?
For example, the following functions are used in my functions, and I haven't written any code of block process in the explicit, but can I assume that these recognize the block size and process it sequentially?
I couldn't find any place in the code where getAutoBlockSize
is written explicitly though.
DelayedArray::aperm
DelayedArray::realize
HDF5Array::writeHDF5Array
HDF5Array::ReshapedHDF5Array
Also, I would like to know if there is a way to find out if a source code is block size aware or not. If there is the list somewhere, it would be helpful.
Koki
Very interesting.
So, if I run a delayed operation and don't perform the actual calculation, but simply stack the calculation, and then
realize
the calculation after that, can I assume that the writes to the file (e.g. HDF5) that are required during the calculation are block size aware?For example, the following codes use a combination of delayed operation and block-processed operation but the code as a whole does not exceed the block size, is that correct?
Also, I think that even a simple delayed operation can cause a memory error (e.g., Error: C stack usage of HDF5Array) and does it mean that there is not enough memory to stack the calculation?
Koki
Yes, that's correct. More precisely:
realize(x, "HDF5Array")
just callsas(x, "HDF5Array")
, which just callswriteHDF5Array(x)
, so the three are equivalent. The workhorse behindwriteHDF5Array(x)
isDelayedArray::BLOCK_write_to_sink()
(this is an internal helper so is not documented). As its name suggestsDelayedArray::BLOCK_write_to_sink()
is block-size aware i.e. it will define a grid of blocks onx
that respectsgetAutoBlockSize()
, walk on the blocks of that grid, and realize each block before writing them to disk.Note however that choosing blocks that respect
getAutoBlockSize()
isn't a guarantee that the code won't use more memory than the block size. This is a common misconception. See the last paragraph of the Details section in?getAutoBlockSize
for more information about this.Well, not a simple delayed operation. You need to stack tens of thousands of delayed operations on an object to end up with a "C stack usage is too close to the limit" problem. This typically happens when you apply a delayed operation in a loop which is almost never a good idea.
H.
Ok, I got the gist of it.
If I have the same situation of the previous case (Error: C stack usage of HDF5Array), where I have to do delayed operations repeatedly, I'd better perform
realize
often to avoid the "C stack usage is too close to the limit" error.Thanks a lot.