rowsum or similar for RleArray?
3
1
Entering edit mode
maltethodberg ▴ 180
@maltethodberg-9690
Last seen 5 hours ago
Denmark

I'm experimenting with the new RleArrays from DelayedArray, and I was happy to see most operations work seamlessly just like a normal matrix. 

I often use the the very fast rowsum function to aggregate rows of matrix, but this does not seem to work on an RleArray. Is there an alternative to the rowsum-function in for RleArrays?

 

delayedarray rowsum • 1.9k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 19 hours ago
The city by the bay

The closest I can think of is some work by Peter Hickey: https://github.com/PeteHaitch/DelayedMatrixStats. I don't know if this is applicable to RleArray objects, though.

ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 12 hours ago
Seattle, WA, United States

Hi,

No rowsum method for DelayedMatrix objects yet, but there is a rowSums method. In the meantime, here is a quick and dirty implementation for rowsum that doesn't support reorder or na.rm (like rowsum.default and rowsum.data.frame do):

rowsum.DelayedMatrix <- function(x, group, reorder=TRUE, ...)
    apply(x, 2, function(col) sum(splitAsList(col, group)))

Then:

library(DelayedArray)
M <- as(matrix(runif(100000), nrow=2000), "RleArray")
group <- sample(8, 2000, replace=TRUE)
identical(rowsum(M, group), rowsum(as.matrix(M), group))
# [1] TRUE

Is not as fast as on an ordinary matrix though...

Cheers,

H.

ADD COMMENT
0
Entering edit mode
@charles-plessy-7857
Last seen 14 months ago
Japan

If a DataFrame of Rles is good for you, you might be interested to have a look at my benchmark, where I concluded that rowsum(as.data.frame(lapply(DF, decode)), group) is the fastest option.

ADD COMMENT
0
Entering edit mode

Note that you can simply do as.data.frame(DF) to turn DF into an ordinary data.frame.

However it's important to keep in mind that converting a DataFrame of Rle's to an ordinary data.frame in order to operate on it kind of defeats the purpose of using a DataFrame of Rle's in the first place.

Edit: Forgot to mention that rowsum() for DelayedArray objects (including for RleArray objects) was implemented a while ago in the DelayedArray package.

ADD REPLY

Login before adding your answer.

Traffic: 689 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6