Dear BioC developers and community,
I am using more and more DataFrame
s of Rle
values, typically for transcriptome expression data, and I end up writing more and more functions that take a DataFrame
, lapply
a function that unpack the Rle
, apply a second function, repack the Rle
and convert the resulting list in a DataFrame
. I was just wondering (actually, searched and did not find) if there are already classes or packages providing such a functionality, or provide methods such as colSums
, rowsum
, cor
, etc, adapted to be efficient in that context.
Have a nice day !
Thanks a lot Hervé! It took me some time to understand the obvious, but the
DelayedArray
wrappers are exactly what I needed.Would you recommend to I wrap in
DelayedArray
s just before performing matrix-like operations, or to use theDelayedArray
class as the base class for the assays in theSummarizedExperiment
objects that I produce ?(The background of my questions is that I am refactoring the
CAGEr
package to useMultiAssayExperiment
s,SummarizedExperiment
s andDataFrame
s ofRle
s extensively).Hi Hervé, I have been using
rowSums(DelayedArray(DF))
for almost 6 years now, but this week I got curious about performance and did a benchmark. Interestingly, it is much faster todecode
the values and sum them than to wrap theDataFrame
in aDelayedArray
, or to sum the Rle values without decoding them. I hope it can be useful to you and others. Interestingly, ChatGPT did not give working code because it confusedrunValue
anddecode
...Hi Charles,
Thanks for the feedback. Operating _natively_ on the DF of Rle objects will always be more efficient than wrapping the object first in a DelayedArray object. The latter is only a quick and easy way to expedite things by getting access to all the operations supported by DelayedArray objects in general. However nothing replaces operations that are implemented to work directly on a specific type of DelayedArray seed.
Note that these "native operations" must be careful to avoid expanding all the Rle's in the DF _at once_. This is easy to do with
rowSums()
, but is sometimes a little bit less straightforward like in the case ofrowVars()
.Best,
H.