The error message looks pretty clear to me:
library(SingleCellExperiment)
example(SingleCellExperiment, echo=FALSE) # generate 'sce'
sce2 <- sce
reducedDim(sce2, "PCA") <- reducedDim(sce2, "PCA")[,1,drop=FALSE]
cbind(sce, sce2)
## Error in value[[3L]](cond) :
## failed to combine 'int_colData' in 'cbind(<SingleCellExperiment>)':
## failed to rbind column 'reducedDims' across DataFrame objects:
## failed to rbind column 'PCA' across DataFrame objects:
## number of columns of matrices must match (see arg 2)
The first line about the int_colData
is generated because that's how the reduced dimensions are stored internally, but the rest of the error message is pretty clear about what the problem is. Technically, we could edit the message to get rid of the int_colData
line, but that would require us to abandon the auto-generated error messages that we get for free in the cbind,DataFrame-method
implementation. Intercepting the errors and replacing them with custom messages would require a decent amount of work and I think the current state is informative enough.
In reality it probably makes most sense to actually remove any reducedDims from cbinded objects since the space that would be plotted by combining coordinates from two different latent spaces make no sense.
Dropping reduced dimensions would go under the definition of "surprising and unexpected behavior". cbind
and other low-level operations should do what they're told - in this case, to stick objects together by column. Generally speaking, these operations should be consistent with the behavior of subsetting, so I should be able to do:
sce.first <- sce[,1:100]
sce.second <- sce[,-(1:100)]
re.sce <- cbind(sce.first, sce.second) # should be effectively the same as 'sce'.
Low-level operations should not make any judgements on the statistical/scientific sensibility, only on the coherency of the data structure. Indeed, in the case above, the reduced dimensions correspond to the same latent space, so it's eminently sensible to plot the result. More generally, I've had the need to store multiple t-SNEs for separate subsets of the same dataset; the most efficient approach is to just bind all the t-SNEs into a single reducedDim
entry and inform downstream applications that the coordinates should only be plotted for one subset at a time.