Hi,
I am using singlecellexperiment with the proposed pipeline in the bioconductor book, and plot the UMAP plot with clusters as labels. I also used singleR to predict cell types and colored the cells in the UMAP according to their prediction.
Interestingly, different clusters which are apart from each other in the UMAP plot contain similar cell types (neurons, for example) from the singleR prediction. In addition, different cell types appear in the same cluster (neurons and B cell), which is strange too. When I compared the differential expression between similar cell types in different clusters I got very few to none differentially expressed genes (which makes sense), yet the cells of similar types cluster separately and are located apart in the UMAP plot.
I used the same code and clustering methods as suggested in the single cell bioconductor book (https://osca.bioconductor.org/)
Any help would be appreciated.
Thanks!
Here is the code I am using. If you run the code, even without clustering you can see that the B cells are all over the place without being located in one cluster:
# dataset
sce.zeisel <- ZeiselBrainData()
# QC, normalization, PCA
stats <- perCellQCMetrics(sce.zeisel, subsets=list(Mito=which(location=="MT")))
high.mito <- isOutlier(stats$subsets_Mito_percent, type="higher")
sce.zeisel <- sce.zeisel[,!high.mito]
set.seed(1000)
clusters <- quickCluster(sce.zeisel)
sce.zeisel <- computeSumFactors(sce.zeisel, cluster=clusters)
sce.zeisel <- logNormCounts(sce.zeisel)
set.seed(1001)
dec.sce.zeisel <- modelGeneVarByPoisson(sce.zeisel)
top.sce.zeisel <- getTopHVGs(dec.sce.zeisel, prop=0.1)
set.seed(10000)
sce.zeisel <- denoisePCA(sce.zeisel, subset.row=top.sce.zeisel, technical=dec.sce.zeisel)
set.seed(1000000)
sce.zeisel <- runUMAP(sce.zeisel, dimred="PCA")
#cell type prediction
pred <- SingleR(test = sce.zeisel, ref= hpca.se, labels=hpca.se$label.main)
table(pred$labels)
sce.zeisel$celltypes <- pred$pruned.labels
# UMAP
plotUMAP(sce.zeisel, colour_by="celltypes")
# clustering
g <- buildSNNGraph(sce.zeisel, k=10, use.dimred = 'PCA') # choosing K affects the seperation between the clusters
clust <- igraph::cluster_walktrap(g)$membership
colLabels(sce.zeisel) <- factor(clust)
# UMAP with clusters
plotUMAP(sce.zeisel, colour_by="label", text_by = "label")
Without a reproducible example, who knows?
I added the code using a publicly available dataset, and getting the same issue. Similarly predicted cell types by singleR (in the case B cells) are dominating different clusters, which are separated in the UMAP space.