DelayedArray::read_sparse_block
2
0
Entering edit mode
Koki ▴ 10
@koki-7888
Last seen 6 months ago
Japan

It seems that a function read_sparse_block was implemented 3 months ago but I don't know the details.

cf. https://github.com/Bioconductor/DelayedArray/blob/5544cc3e06d66c209c5394f675805d7ab6890f03/man/read_sparse_block.Rd

Is this simply the same as the read_block function with the as.sparse=TRUE option, or is it an extract function for SparseArraySeeds?

DelayedArray • 771 views
ADD COMMENT
1
Entering edit mode
@herve-pages-1542
Last seen 2 days ago
Seattle, WA, United States

Hi Koki,

Sorry again for the slow response.

read_sparse_block() has been around for years. It's a generic function defined in the DelayedArray package, with several methods defined in downstream packages like HDF5Array or TileDBArray. It is not meant to be called directly. You should always call read_block() instead. See ?S4Arrays::read_block for more information (read_block has moved from DelayedArray to S4Arrays).

That being said, I recently introduced the read_block_as_sparse() generic in the SparseArray package. It will be a replacement for read_sparse_block(). The difference is that read_block_as_sparse() will return a SparseArray object (typically an SVT_SparseArray), whereas read_sparse_block() returns a SparseArraySeed object. This change is part of the plan to use the new and efficient SVT_SparseArray representation everywhere internally in the DelayedArray framework to handle sparse data, instead of the old and inefficient SparseArraySeed representation. This is a work-in-progress. See https://github.com/Bioconductor/DelayedArray/blob/devel/TODO for a detailed roadmap for this transition.

In other words, this is DelayedArray's internal business and the impact on downstream packages and other client code should be minimal.

Best,

H.

ADD COMMENT
0
Entering edit mode

Thanks for your reply.

So do you means the specification change from DelayedArray's read_sparse_block to SparseArray's read_block_as_sparse will accelerate the calculation speed of Input/Output of sparse array to/from HDF5?

Will such a feature be available in DelayedArray in next coming BioC 3.18? https://github.com/Bioconductor/DelayedArray/blob/devel/TODO

ADD REPLY
0
Entering edit mode

So do you means the specification change from DelayedArray's read_sparse_block to SparseArray's read_block_as_sparse will accelerate the calculation speed of Input/Output of sparse array to/from HDF5?

Calculation speed depends on many factors like size of the data, size of the blocks, sparseness of the blocks, what operations are performed on the blocks, available memory, disk speed, etc... so I don't want to promise anything. The hope is that we will see some speed improvements in some situations but not necessarily in all situations involving sparse data.

Will such a feature be available in DelayedArray in next coming BioC 3.18? https://github.com/Bioconductor/DelayedArray/blob/devel/TODO

I'm lagging a little bit behind the initial plan so I can't promise that either.

Best,

H.

ADD REPLY
0
Entering edit mode

Ok, I understood.

How about "writing" to HDF5?

Are you going to implement write_block_as_sparse in SparseArray package?

ADD REPLY
0
Entering edit mode

No. But I will need to modify the write_block() method for TENxRealizationSink implemented in HDF5Array to make it handle blocks that are SVT_SparseArray objects. I don't expect any significant performance improvement from that though.

H.

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

The function itself is in the same GitHub you are referencing, under the R subdirectory.

Login before adding your answer.

Traffic: 414 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6