convert S4 DataFrame of Rle objects to sparse matrix
1
1
Entering edit mode
@koen-van-den-berge-6369
Last seen 11 weeks ago
Ghent University, Belgium

 

In the R language, I have an S4 DataFrame consisting of Rle encoded elements.
The data can be simulated using following code

    x = DataFrame(Rle(1:10),Rle(11:20),Rle(21:30))

Now, I want to convert this DataFrame to a sparse matrix from the Matrix package. On a usual data.frame, one can do

    Matrix(x,sparse=TRUE)

However, this does not work for DataFrames, as it gives the following error:

`Error in as.vector(data) : `
 ` no method for coercing this S4 class to a vector`

Also, Matrix(as.data.frame(x)) does not work as it gives the following error:

Error in asMethod(object) : invalid class 'NA' to dup_mMatrix_as_geMatrix 

Any ideas on how to convert between data types in a rather efficient way?

Thanks!

s4 • 3.5k views
ADD COMMENT
2
Entering edit mode
@michael-lawrence-3846
Last seen 6 weeks ago
United States

Direct construction of Matrix objects from data.frame objects does not even work in base R:

> Matrix(as.data.frame(mtcars))
Error in isN0(as(m, "matrix")) : 
  (list) object cannot be coerced to type 'double'

And coercing a DataFrame of Rle objects to a data.frame (and thus the Rle objects to ordinary vectors) defeats the purpose of the run-length encoding.

I know there is at least one RleDataFrame class floating around, and it should probably make its way into S4Vectors.

Once we have that, it would be nice to have a direct, efficient route from a table of Rle objects to a sparse matrix encoding, i.e., one that takes into account the zeros explicitly.

But if you really need this coercion today, and don't mind expanding the Rle objects, then:

> Matrix(as.matrix(as.data.frame(x)))

should do the trick.

 

ADD COMMENT
0
Entering edit mode

Hi Michael,

The proposed code indeed does work, however it is very inefficient, as you already mention yourself. I was hoping for a more efficient conversion between the two data types. The conversion you propose gives me memory issues for allocating the regular matrix.

 

ADD REPLY
0
Entering edit mode

I will work on this. 

ADD REPLY
0
Entering edit mode

Is there an answer to this question?

ADD REPLY
2
Entering edit mode

Never got around to doing anything here. The S4Vectors package does not depend on Matrix, so I am not sure where this should go, but here is a simple way to go from Rle to Matrix (assuming the DataFrame is called "df"):

setAs("Rle", "Matrix", function(from) {
    rv <- runValue(from)
    nz <- rv != 0
    i <- as.integer(ranges(from)[nz])
    x <- rep(rv[nz], runLength(from)[nz])
    sparseMatrix(i=i, p=c(0L, length(x)), x=x)
})

setAs("DataFrame", "Matrix", function(from) {
    do.call(cbind, lapply(from, as, "Matrix"))
})

as(df, "Matrix")
ADD REPLY
0
Entering edit mode

Thanks for quick response.

ADD REPLY
2
Entering edit mode

Just one little bug fix to make sure length is correct if Rle ends with zeros:

#' Convert from Rle to one column matrix
#'
setAs("Rle", "Matrix", function(from) {
    rv <- runValue(from)
    nz <- rv != 0
    i <- as.integer(ranges(from)[nz])
    x <- rep(rv[nz], runLength(from)[nz])
    sparseMatrix(i=i, p=c(0L, length(x)), x=x,
                 dims=c(length(from), 1))
})

#' Convert from DataFrame of Rle to sparse Matrix
#'
setAs("DataFrame", "Matrix", function(from) {
  mat = do.call(cbind, lapply(from, as, "Matrix"))
  colnames(mat) <- colnames(from)
  rownames(mat) <- rownames(from)
  mat
})
ADD REPLY

Login before adding your answer.

Traffic: 218 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6