I am using singlecellexperiment with the proposed pipeline in the bioconductor book, and plot the UMAP plot with clusters as labels. I also used singleR to predict cell types and colored the cells in the UMAP according to their prediction.
Interestingly, different clusters which are apart from each other in the UMAP plot contain similar cell types (neurons, for example) from the singleR prediction. In addition, different cell types appear in the same cluster (neurons and B cell), which is strange too. When I compared the differential expression between similar cell types in different clusters I got very few to none differentially expressed genes (which makes sense), yet the cells of similar types cluster separately and are located apart in the UMAP plot.
I used the same code and clustering methods as suggested in the single cell bioconductor book (https://osca.bioconductor.org/)
Any help would be appreciated.
Here is the code I am using. If you run the code, even without clustering you can see that the B cells are all over the place without being located in one cluster:
# dataset sce.zeisel <- ZeiselBrainData() # QC, normalization, PCA stats <- perCellQCMetrics(sce.zeisel, subsets=list(Mito=which(location=="MT"))) high.mito <- isOutlier(stats$subsets_Mito_percent, type="higher") sce.zeisel <- sce.zeisel[,!high.mito] set.seed(1000) clusters <- quickCluster(sce.zeisel) sce.zeisel <- computeSumFactors(sce.zeisel, cluster=clusters) sce.zeisel <- logNormCounts(sce.zeisel) set.seed(1001) dec.sce.zeisel <- modelGeneVarByPoisson(sce.zeisel) top.sce.zeisel <- getTopHVGs(dec.sce.zeisel, prop=0.1) set.seed(10000) sce.zeisel <- denoisePCA(sce.zeisel, subset.row=top.sce.zeisel, technical=dec.sce.zeisel) set.seed(1000000) sce.zeisel <- runUMAP(sce.zeisel, dimred="PCA") #cell type prediction pred <- SingleR(test = sce.zeisel, ref= hpca.se, labels=hpca.se$label.main) table(pred$labels) sce.zeisel$celltypes <- pred$pruned.labels # UMAP plotUMAP(sce.zeisel, colour_by="celltypes") # clustering g <- buildSNNGraph(sce.zeisel, k=10, use.dimred = 'PCA') # choosing K affects the seperation between the clusters clust <- igraph::cluster_walktrap(g)$membership colLabels(sce.zeisel) <- factor(clust) # UMAP with clusters plotUMAP(sce.zeisel, colour_by="label", text_by = "label")