Test wether bioconductor class is linear or rectangular in base R
1
0
Entering edit mode
@teun-van-den-brand-13039
Last seen 7 months ago
Amsterdam

Hi everyone,

Ideally, I would like to have a test for where I can test with base R generics whether a Bioconductor S4 class is linear like the base R vectors or rectangular/multidimensional.

What cases would I likely miss if I use is.null(dim(x)) as test to check whether a class is linear?

It seems to work in these cases:

suppressPackageStartupMessages(
{
library(S4Vectors)
library(DelayedArray)
library(SummarizedExperiment)
}
)

is.null(dim(Rle()))
#> [1] TRUE
is.null(dim(Factor("A")))
#> [1] TRUE
is.null(dim(GRanges()))
#> [1] TRUE
is.null(dim(DataFrame()))
#> [1] FALSE
is.null(dim(DelayedArray(seed = matrix())))
#> [1] FALSE
is.null(dim(SummarizedExperiment()))
#> [1] FALSE


Created on 2020-02-25 by the reprex package (v0.3.0)

class S4 • 166 views
2
Entering edit mode
@herve-pages-1542
Last seen 2 days ago
Seattle, WA, United States

Works except for SplitDataFrameList objects which are kind of an abnormality in the ecosystem:

> SDF <- split(DataFrame(aa=1:9), rep(1:3, 4:2))
> dim(SDF)
[,1] [,2]
1    4    1
2    3    1
3    2    1
> nrow(SDF)
1 2 3
4 3 2
> ncol(SDF)
1 2 3
1 1 1


One might argue that these objects are linear. They derive from List which is linear. Their length is the number of list elements:

> length(SDF)
[1] 3


Note that even rectangular or multidimensional objects can be seen as linear along one of their dimension. For example an ordinary data.frame can be seen as linear along its 2nd dimension, that is, it can be seen as a vector of columns. So its elements are its columns and its length is its number of columns. This is consistent with the fact that data.frame inherits from list. And in fact, like for a list, 1D-style subsetting is supported (i.e. subsetting with a single subscript i, i.e. x[i], a.k.a linear subsetting):

> df <- data.frame(aa=1:4, bb=letters[1:4], cc=11:14)
> df[c(3, 1)]
cc aa
1 11  1
2 12  2
3 13  3
4 14  4


So, technically speaking, that makes them linear objects.

HOWEVER, one could argue that seeing a data.frame as a vector of columns is not the most intuitive way to look at it. One might prefer to think of a data.frame as a vector of rows. In other words, one might prefer to see a data.frame as a linear object along its 1st dimension.

Same metaphysical questions for matrix objects. Their length is the number of matrix elements and they also support 1D-style subsetting (this will subset the underlying vector), which also makes them linear objects, technically speaking. Even though one might prefer to think of a matrix as a vector of columns... or maybe as a vector of rows.

FWIW, in the Vector framework (implemented in S4Vectors), data-frame-like objects (e.g. data.frame, DataFrame, TransposedDataFrame, etc...) and matrix-like objects (e.g. matrix, SummarizedExperiment, DelayedMatrix, etc...) are always considered linear along their 1st dimension. This is reflected by how things like extractROWS(), replaceROWS(), and bindROWS() behave on them: they all behave consistently with what NROW() returns on them. Note that these low-level generics were introduced exactly for that: to make multidimensional objects feel linear along their 1st dimension. By using them (and NROW()) instead of [, [<-, c(), or length(), one can handle anything that needs to be handled as a linear object. In particular this is used in the DataFrame code itself to handle any kind of column (e.g. GRanges, Rle, Hits, matrix, data.frame) and to treat it as a linear thing (whether it has dimensions or not).

Best,

H.

0
Entering edit mode

Great and detailed answer, this was exactly the type of information I was looking for! Thank you Hervé Pagès