Question

How to deal with Expression set assembly problem?

0

Entering edit mode

Hasan Alanya • 0

@d269de72

Last seen 4.0 years ago

United States

Hi all,

I am trying to create an ExpressionSet from scRNAseq data so I can identify cell-type composition using the MuSiC Deconvolution method, however, the last line of the code gives an error like that:

Error in validObject(.Object) : invalid class “ExpressionSet” object: 1: sample numbers differ between assayData and phenoData invalid class “ExpressionSet” object: 2: sampleNames differ between assayData and phenoData invalid class “ExpressionSet” object: 3: sample numbers differ between phenoData and protocolData invalid class “ExpressionSet” object: 4: sampleNames differ between phenoData and protocolData

I guess that it's due to adding single-cell ids in column X but I need to use them.

Meanwhile, the dim(subsetexpr) is 24190 x 4114 and dim(phenoData) is 4113 x 7.

And identical(colnames(subsetexpr), rownames(phenoData)) returns FALSE, which is an problem.

You can see my code below.

Appreciate all your suggestions. Thanks a lot!


# install devtools if necessary
# install.packages('devtools')

# install the MuSiC package
devtools::install_github('xuranw/MuSiC')

# load
library(MuSiC)
library(Biobase)
library(GEOquery)

# loading the count matrix of scRNAseq data 

dataDirectory<- setwd("...")
exprsFile <- file.path(dataDirectory, "...count.txt")
exprs <- as.matrix(read.table(exprsFile, header=TRUE, sep="\t"))

# creating the phenotypic data file

pDataFile <- file.path(dataDirectory, "...CellInfo.txt")
pData <- read.table(pDataFile, header=TRUE, sep="\t")  
head(pData)

# subsetting between colnames(exprs) and pData$X

i<-which(colnames(exprs) %in% pData$X) # X col contains singel cell ids
length(i)
subsetexpr<-exprs[,i]
exprs[1:5,1:5]
subsetexpr[1:5,1:5]

subsetexpr<-cbind(exprs[,1],subsetexpr)
subsetexpr[1:5,1:5]

# creating a data frame containing meta-data

metadata <- data.frame(labelDescription=c('cell id', 'number of genes', 'number of UMI',
                                          'origin identity', 'percent mito', 'cell type',
                                          'clusters'),
                       row.names=c('X', 'nGene', 'nUMI', 'origin.ident', 'percent.mito',
                                   'cell_type', 'subcluster'))

phenoData <- new("AnnotatedDataFrame", data=pData, varMetadata=metadata) #4113 x 7

# assemblying all into an ExpressionSet

assemb.expSet <- ExpressionSet(assayData=subsetexpr, phenoData=phenoData)

deconvolution r scRNAseq bioinformatics • 2.1k views

ADD COMMENT • link 4.0 years ago Hasan Alanya • 0

score 2 · Accepted Answer · 2022-02-08

2

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 16 hours ago

United States

The ExpressionSet class isn't really intended for scRNA-Seq data. You would be better served by using the [SingleCellExperiment][1] class. Although you will still have problems if you don't have as many rows in your colData object as you have columns in your 'counts` object. That's part of the validity checking - you must have information for each sample.