How to deal with Expression set assembly problem?
1
0
Entering edit mode
@d269de72
Last seen 2.8 years ago
United States

Hi all,

I am trying to create an ExpressionSet from scRNAseq data so I can identify cell-type composition using the MuSiC Deconvolution method, however, the last line of the code gives an error like that:

Error in validObject(.Object) : invalid class “ExpressionSet” object: 1: sample numbers differ between assayData and phenoData invalid class “ExpressionSet” object: 2: sampleNames differ between assayData and phenoData invalid class “ExpressionSet” object: 3: sample numbers differ between phenoData and protocolData invalid class “ExpressionSet” object: 4: sampleNames differ between phenoData and protocolData

I guess that it's due to adding single-cell ids in column X but I need to use them.

Meanwhile, the dim(subsetexpr) is 24190 x 4114 and dim(phenoData) is 4113 x 7.

And identical(colnames(subsetexpr), rownames(phenoData)) returns FALSE, which is an problem.

You can see my code below.

Appreciate all your suggestions. Thanks a lot!


# install devtools if necessary
# install.packages('devtools')

# install the MuSiC package
devtools::install_github('xuranw/MuSiC')

# load
library(MuSiC)
library(Biobase)
library(GEOquery)
# loading the count matrix of scRNAseq data 

dataDirectory<- setwd("...")
exprsFile <- file.path(dataDirectory, "...count.txt")
exprs <- as.matrix(read.table(exprsFile, header=TRUE, sep="\t"))

# creating the phenotypic data file

pDataFile <- file.path(dataDirectory, "...CellInfo.txt")
pData <- read.table(pDataFile, header=TRUE, sep="\t")  
head(pData)
# subsetting between colnames(exprs) and pData$X

i<-which(colnames(exprs) %in% pData$X) # X col contains singel cell ids
length(i)
subsetexpr<-exprs[,i]
exprs[1:5,1:5]
subsetexpr[1:5,1:5]

subsetexpr<-cbind(exprs[,1],subsetexpr)
subsetexpr[1:5,1:5]
# creating a data frame containing meta-data

metadata <- data.frame(labelDescription=c('cell id', 'number of genes', 'number of UMI',
                                          'origin identity', 'percent mito', 'cell type',
                                          'clusters'),
                       row.names=c('X', 'nGene', 'nUMI', 'origin.ident', 'percent.mito',
                                   'cell_type', 'subcluster'))

phenoData <- new("AnnotatedDataFrame", data=pData, varMetadata=metadata) #4113 x 7

# assemblying all into an ExpressionSet

assemb.expSet <- ExpressionSet(assayData=subsetexpr, phenoData=phenoData)
deconvolution r scRNAseq bioinformatics • 1.5k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 49 minutes ago
United States

The ExpressionSet class isn't really intended for scRNA-Seq data. You would be better served by using the [SingleCellExperiment][1] class. Although you will still have problems if you don't have as many rows in your colData object as you have columns in your 'counts` object. That's part of the validity checking - you must have information for each sample.

ADD COMMENT
0
Entering edit mode

Many thanks, James! I will check it out.

ADD REPLY

Login before adding your answer.

Traffic: 818 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6