Hi all,
I am trying to create an ExpressionSet from scRNAseq data so I can identify cell-type composition using the MuSiC Deconvolution method, however, the last line of the code gives an error like that:
Error in validObject(.Object) : invalid class “ExpressionSet” object: 1: sample numbers differ between assayData and phenoData invalid class “ExpressionSet” object: 2: sampleNames differ between assayData and phenoData invalid class “ExpressionSet” object: 3: sample numbers differ between phenoData and protocolData invalid class “ExpressionSet” object: 4: sampleNames differ between phenoData and protocolData
I guess that it's due to adding single-cell ids in column X but I need to use them.
Meanwhile, the dim(subsetexpr) is 24190 x 4114 and dim(phenoData) is 4113 x 7.
And identical(colnames(subsetexpr), rownames(phenoData)) returns FALSE, which is an problem.
You can see my code below.
Appreciate all your suggestions. Thanks a lot!
# install devtools if necessary
# install.packages('devtools')
# install the MuSiC package
devtools::install_github('xuranw/MuSiC')
# load
library(MuSiC)
library(Biobase)
library(GEOquery)
# loading the count matrix of scRNAseq data
dataDirectory<- setwd("...")
exprsFile <- file.path(dataDirectory, "...count.txt")
exprs <- as.matrix(read.table(exprsFile, header=TRUE, sep="\t"))
# creating the phenotypic data file
pDataFile <- file.path(dataDirectory, "...CellInfo.txt")
pData <- read.table(pDataFile, header=TRUE, sep="\t")
head(pData)
# subsetting between colnames(exprs) and pData$X
i<-which(colnames(exprs) %in% pData$X) # X col contains singel cell ids
length(i)
subsetexpr<-exprs[,i]
exprs[1:5,1:5]
subsetexpr[1:5,1:5]
subsetexpr<-cbind(exprs[,1],subsetexpr)
subsetexpr[1:5,1:5]
# creating a data frame containing meta-data
metadata <- data.frame(labelDescription=c('cell id', 'number of genes', 'number of UMI',
'origin identity', 'percent mito', 'cell type',
'clusters'),
row.names=c('X', 'nGene', 'nUMI', 'origin.ident', 'percent.mito',
'cell_type', 'subcluster'))
phenoData <- new("AnnotatedDataFrame", data=pData, varMetadata=metadata) #4113 x 7
# assemblying all into an ExpressionSet
assemb.expSet <- ExpressionSet(assayData=subsetexpr, phenoData=phenoData)
Many thanks, James! I will check it out.