Question: Difficulty extending ExpressionSet
0
8 months ago by
andrew.birnberg20 wrote:

Hello, This question was asked here previously around 7 years ago (On extending class ExpressionSet), however the answer seems to have either changed or the problem I'm facing is not the same.

I am trying to write a class that extends ExpressionSet. I keep running into the following error:

Error in checkSlotAssignment(object, name, value) :
assignment of an object of class “matrix” is not valid for slot ‘assayData’ in an object of class “ExpressionSet”; is(value, "AssayData") is not TRUE


My class is simply a thin wrapper around ExpressionSet to give a couple additional slots and to allow me to write some specific methods that would not directly apply to an ExpressionSet instance.

By way of the simplest example, the following seems to trigger the error:

setClass("MyExpressionSet",
contains = "ExpessionSet"
) -> MyExpressionSet

mat <- matrix(runif(100), nrow = 20, ncol =5))

# Fails:
my_instance <- MyExpressionSet(mat)

# Works:
normal_instance <- Biobase::ExpressionSet(mat)


The error you get for the above is:

Error: MyExpSet 'assayData' is class 'matrix' but should be or extend 'AssayData'


I've looked at the developer documentation about how to extend eSet, however those recommendations, particularly in terms of the initialize method, did not resolve my issue, e.g. naming all arguments and writing a new initialize method that only passes relevant arguments to the ExpressionSet initializer does not resolve the issue.

Based on the answer to the question at the above link, I understand that there are peculiarities with how the initialize method was written for ExpressionSet and eSet that make it difficult to extend. I have a feeling the issue may be with the validObject function, but I'm not sure.

Has anyone on this forum sucessfully extended eSet or ExpressionSet without having to include modified versions of the originally class definitions in their package? I've taken a look at (https://github.com/lgatto/MSnbase), which has several classes that extend eSet, however they are using deprecated S4 syntax and have essentially copied large portions of the eSet,initialize method in order to make their classes work.

Any advice on this would be very much appreciated.

Thanks!

Andrew

Session info:

R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin17.5.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.5.0      parallel_3.5.0      tools_3.5.0         yaml_2.1.19         Biobase_2.40.0
[6] BiocGenerics_0.26.0

biobase expressionset eset • 256 views
modified 8 months ago by Martin Morgan ♦♦ 23k • written 8 months ago by andrew.birnberg20
2
8 months ago by
Martin Morgan ♦♦ 23k
United States
Martin Morgan ♦♦ 23k wrote:

As Aaron says, new development should focus on SummarizedExperiment.

Since 'mat' is the exprs, try

MyExpressionSet(exprs=mat)

Normally one would not expose the 'raw' constructor returned by setClass(), but use that in a constructor that performed initial argument checking / coercion, etc.

If ExpressionSet were implemented following standard S4 conventions, then one would expect

MyExpressionSet(ExpressionSet(exprs=mat))

to work -- the unnamed argument(s) to new() are meant to initialize the base class. So even under normal circumstances one would not expect MyExpressionSet(mat) to work -- MyExpressionSet does not extend matrix.

Thanks! I guess I had assumed that since no initialize method was defined for the subclass it would pass all its arguments unchanged to the parent class initialize method, like an implicit callNextMethod. I need to look more into SummarizedExperiment and how difficult it would be to interoperate with existing code designed with eSet derivatives in mind. A lot of limma code can be executed just with the exprs matrix, so it might not be much of a problem in my use case.

It does do as you say, modulo first constructing a prototype 'MyExpressionSet' object .Object. So your matrix is passed as the 'assayData' argument in the initialize signature, which is not actually what you want (assayData has been transformed by ExpressionSet() to be an environment containing the matrix). This generally illustrates the problem of initialize -- it exposes the internal structure of the class, which is really none of the user's business. The reason naming the argument exprs = mat works is because it matches the correct argument for your use case. The only way you'd know this is by closely examining the code, which is again generally not best practice.

Bioconductor has learned from at least some of its mistakes, and the SummarizedExperiment class is much better behaved in the way it works, in particular offering a 'real' constructor SummarizedExperiment() that is more than just the renamed call to new() returned by setClass().

These discussions belong on the bioc-devel mailing list.

> selectMethod("initialize", "MyExpressionSet")
Method Definition:

function (.Object, ...)
{
.local <- function (.Object, assayData, phenoData, featureData,
exprs = new("matrix"), ...)
{
if (missing(assayData)) {
if (missing(phenoData)) 
2
8 months ago by
Aaron Lun23k
Cambridge, United Kingdom
Aaron Lun23k wrote:

This seems like a package development question, which belongs on the Bioc-devel mailing list.

While I'm here, I'll note that SummarizedExperiment is really the way to go if you're creating a new class for some expression-like data structure (i.e., samples in columns, features in rows). There's some "canonical" instructions on how to extend the SE class, see the SummarizedExperiment vignettes for more details.

1
8 months ago by
Denali
Steve Lianoglou12k wrote:

The RccSet class in the NanoStringQCPro package extends ExpressionSet, and I don't really see any wholesale copying of any code in there at all.

I see what you're saying. After looking over the code for the RccSet class definition a bit, it looks like they had to create a new assayData environment instance out of a matrix before calling the ExpressionSet constructor and made a bunch of methods to cover their bases depending on the input. However, I still don't understand why this isn't "automatic" when you're inheriting from ExpressionSet. The code is definitely helpful, although it feels like a lot of work just to add a single slot to the class...