convert S4 DataFrame of Rle objects to sparse matrix
1
1
Entering edit mode
@koen-van-den-berge-6369
Last seen 11 weeks ago
Ghent University, Belgium

In the R language, I have an S4 DataFrame consisting of Rle encoded elements.
The data can be simulated using following code

    x = DataFrame(Rle(1:10),Rle(11:20),Rle(21:30))

Now, I want to convert this DataFrame to a sparse matrix from the Matrix package. On a usual data.frame, one can do

    Matrix(x,sparse=TRUE)

However, this does not work for DataFrames, as it gives the following error:

Error in as.vector(data) : 
no method for coercing this S4 class to a vector

Also, Matrix(as.data.frame(x)) does not work as it gives the following error:

Error in asMethod(object) : invalid class 'NA' to dup_mMatrix_as_geMatrix

Any ideas on how to convert between data types in a rather efficient way?

Thanks!

s4 • 3.5k views
2
Entering edit mode
@michael-lawrence-3846
Last seen 6 weeks ago
United States

Direct construction of Matrix objects from data.frame objects does not even work in base R:

> Matrix(as.data.frame(mtcars))
Error in isN0(as(m, "matrix")) :
(list) object cannot be coerced to type 'double'


And coercing a DataFrame of Rle objects to a data.frame (and thus the Rle objects to ordinary vectors) defeats the purpose of the run-length encoding.

I know there is at least one RleDataFrame class floating around, and it should probably make its way into S4Vectors.

Once we have that, it would be nice to have a direct, efficient route from a table of Rle objects to a sparse matrix encoding, i.e., one that takes into account the zeros explicitly.

But if you really need this coercion today, and don't mind expanding the Rle objects, then:

> Matrix(as.matrix(as.data.frame(x)))

should do the trick.

0
Entering edit mode

Hi Michael,

The proposed code indeed does work, however it is very inefficient, as you already mention yourself. I was hoping for a more efficient conversion between the two data types. The conversion you propose gives me memory issues for allocating the regular matrix.

0
Entering edit mode

I will work on this.

0
Entering edit mode

Is there an answer to this question?

2
Entering edit mode

Never got around to doing anything here. The S4Vectors package does not depend on Matrix, so I am not sure where this should go, but here is a simple way to go from Rle to Matrix (assuming the DataFrame is called "df"):

setAs("Rle", "Matrix", function(from) {
rv <- runValue(from)
nz <- rv != 0
i <- as.integer(ranges(from)[nz])
x <- rep(rv[nz], runLength(from)[nz])
sparseMatrix(i=i, p=c(0L, length(x)), x=x)
})

setAs("DataFrame", "Matrix", function(from) {
do.call(cbind, lapply(from, as, "Matrix"))
})

as(df, "Matrix")
0
Entering edit mode

Thanks for quick response.

2
Entering edit mode

Just one little bug fix to make sure length is correct if Rle ends with zeros:

#' Convert from Rle to one column matrix
#'
setAs("Rle", "Matrix", function(from) {
rv <- runValue(from)
nz <- rv != 0
i <- as.integer(ranges(from)[nz])
x <- rep(rv[nz], runLength(from)[nz])
sparseMatrix(i=i, p=c(0L, length(x)), x=x,
dims=c(length(from), 1))
})

#' Convert from DataFrame of Rle to sparse Matrix
#'
setAs("DataFrame", "Matrix", function(from) {
mat = do.call(cbind, lapply(from, as, "Matrix"))
colnames(mat) <- colnames(from)
rownames(mat) <- rownames(from)
mat
})