SingleCellExperiment package: storing reducedDims metadata
1
0
Entering edit mode
bge • 0
@bge-13787
Last seen 3.9 years ago

I installed SingleCellExperiment_1.12.0.

I am unable to store records in the SingleCellExperiment reduceDims metadata slot using accessors; that is, I want to store the information in

...
  ..@ int_colData        :Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  .. .. ..@ rownames       : NULL
  .. .. ..@ nrows          : int 0
  .. .. ..@ listData       :List of 3
  .. .. .. ..$ reducedDims:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  .. .. .. .. .. ..@ rownames       : NULL
  .. .. .. .. .. ..@ nrows          : int 0
  .. .. .. .. .. ..@ listData       : Named list()
  .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()   <----------------------------------------- here
  .. .. .. ..$ altExps    :Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  .. .. .. .. .. ..@ rownames       : NULL
  .. .. .. .. .. ..@ nrows          : int 0
  .. .. .. .. .. ..@ listData       : Named list()
  .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. .. ..$ colPairs   :Formal class 'DFrame' [package "S4Vectors"] with 6 slots
  .. .. .. .. .. ..@ rownames       : NULL
  .. .. .. .. .. ..@ nrows          : int 0
  .. .. .. .. .. ..@ listData       : Named list()
  .. .. .. .. .. ..@ elementType    : chr "ANY"
  .. .. .. .. .. ..@ elementMetadata: NULL
  .. .. .. .. .. ..@ metadata       : list()
  .. .. ..@ elementType    : chr "ANY"
  .. .. ..@ elementMetadata: NULL
  .. .. ..@ metadata       : list()
...

== I try to store information using

> library(SingleCellExperiment)
> sce<-SingleCellExperiment()
> str(reducedDims(sce))
Formal class 'SimpleList' [package "S4Vectors"] with 4 slots
  ..@ listData       : Named list()
  ..@ elementType    : chr "ANY"
  ..@ elementMetadata: NULL
  ..@ metadata       : list()
> metadata(reducedDims(sce))[['e_a']] <- 'a'
> str(metadata(reducedDims(sce))[['e_a']])
 NULL
> str(metadata(reducedDims(sce)))
 list()

== I can sneak information in directly

> sce@int_colData@listData$reducedDims@metadata[['e_a']] <- 'a'
> metadata(reducedDims(sce))
$e_a
[1] "a"

== The following fails too

> sce <- SingleCellExperiment()
> reducedDims(sce)@metadata[['e_a']] <- 'a'
> str(reducedDims(sce))
Formal class 'SimpleList' [package "S4Vectors"] with 4 slots
  ..@ listData       : Named list()
  ..@ elementType    : chr "ANY"
  ..@ elementMetadata: NULL
  ..@ metadata       : list()
> str(metadata(reducedDims(sce)))
 list()

I wonder how I am erring.

I re-installed Bioconductor 3.12 as a precaution.

> BiocManager::version()
[1] ‘3.12’

The sessionInfo is

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0 Biobase_2.50.0              GenomicRanges_1.42.0        GenomeInfoDb_1.26.2         IRanges_2.24.1
 [7] S4Vectors_0.28.1            BiocGenerics_0.36.0         MatrixGenerics_1.2.0        matrixStats_0.57.0

loaded via a namespace (and not attached):
 [1] lattice_0.20-41        bitops_1.0-6           grid_4.0.3             zlibbioc_1.36.0        XVector_0.30.0         Matrix_1.3-2           tools_4.0.3
 [8] RCurl_1.98-1.2         DelayedArray_0.16.1    compiler_4.0.3         GenomeInfoDbData_1.2.4

Thank you.

SingleCellExperiment • 2.6k views
ADD COMMENT
0
Entering edit mode

Before we get onto solutions: why do you want to do this? Why not just put things in the metadata of the SCE?

ADD REPLY
0
Entering edit mode

Hi,

I want to store a dimensional reduction model, in this case, I want to store a uwot::umap model. It's possible that I will need to store additional model information in other metadata locations. So I am trying to organize the storage.

I wonder why I would not want to store information in the reducedDims metadata?

Thank you!

ADD REPLY
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 5 hours ago
The city by the bay

There is the current easy way, and the future hard way.

The easy way

Store extra bits and pieces in the attributes of the matrix.

mat <- matrix(runif(100), ncol=2)
attr(mat, "stuff") <- list(A=1, B=2)

This is nice and simple, and the resulting object is still a matrix that is compatible with all downstream functions. It's also pretty clear how to get information out:

attr(mat, "stuff")$A
## [1] 1

The biggest downside is that subsetting the matrix will drop all of the (non-dimension-related) attributes. So if you put this in the reducedDims and then subset the SCE, the attributes are gone. Maybe this is a problem, maybe not.

The hard way

Well, it's not that hard, it just doesn't exist yet.

The modification is relatively simple and involves respecting any mcols() passed to reducedDims<-() if a List is supplied as the value. The key point is to use mcols() instead of metadata() - this allows us to link each piece of metadata to its associated dimensionality reduction entry, such that removing the latter will also remove the former. For example:

library(S4Vectors)
x <- List(PCA=matrix(runif(10), ncol=2), UMAP=matrix(runif(10), ncol=2))
mcols(x)$info <- list(PCA=list(rotation=1), UMAP=list(other="stuff"))

sub <- x[2]
mcols(sub)$info
## $UMAP
## $UMAP$other
## [1] "stuff"

You can see how the PCA's information was automatically removed, so it's not sticking around and confusing people when there isn't a PCA result in the list. The advantage of this approach is that it is robust to subsetting of the SCE, as the extra pieces of information will not be dropped like the attributes are. The problem is that it's a little less obvious about how to store and retrieve this information.

If you are interested in the second approach, raise an issue on the GitHub repository and we'll think about how to implement it.

ADD COMMENT
0
Entering edit mode

Hi,

Thank you for the clear descriptions and examples. I am far from adept at comprehending R so the details are invaluable to me.

I will need to subset the data so I will complicate your life a bit.

Is it OK if I ask more naive questions about the SingleCellExperiment and parent classes?

I thank you for your patience and help.

ADD REPLY
0
Entering edit mode

Hi,

I have more questions from the perspective of a developer who defines a class that inherits SingleCellExperiment.

As a developer, I want to write and read the elementMetadata and metadata fields in an internal SingleCellExperiment slot. Is this considered to be acceptable?

Experimentation shows that I can do so by addressing fields directly, for example, (based on your example)

sce <- SingleCellExperiment()
sce@int_colData@listData$reducedDims <- List(PCA=matrix(runif(10), ncol=2), UMAP=matrix(runif(10), ncol=2))
mcols(sce@int_colData@listData$reducedDims)$info <- list(PCA=list(rotation=1), UMAP=list(other="stuff"))

I am guessing that this strategy is frowned upon because I am addressing the fields explicitly, and for other reasons possibly. Or am I wrong? If there are other reasons, could you explain them, please?

I can use the int_colData() accessor in order to limit somewhat the direct field addressing, which may be an improvement but still does not adhere to the OO model. Is there a better/safer way for developers to access these fields?

It appears that the number of reducedDims@elementMetadata@listData$xx elements must be the same as the number reducedDims@listData elements. Is that true? If so, is there a recommended way to initialize the elementMetadata list when assigning values to only a few of the elements?

Are there other important constraints? If so is there a single place where I can find a list of them?

I appreciate your patience and help.

Thank you.

ADD REPLY
0
Entering edit mode

These developer-level questions are best asked on the Bioc-devel mailing list.

ADD REPLY
0
Entering edit mode

Hi Aaron,

Thank you. I appreciate the guidance and help.

ADD REPLY

Login before adding your answer.

Traffic: 706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6