(cross-posted from GitHub issues page)
Hello,
Thanks for this helpful package. I have a question about mapping my scRNAseq data to the Immgen microarray data as the reference. As you probably already know, Immgen contains microarray data from hundreds of sorted immune cells from mice. I was able to get something to work, but I'd like to double-check to make sure I'm using this appropriately. Please see my code below:
Setup scRNAseq data
combined
is a Seurat object containing 4 separate 10x runs aggregated. There are 15 clusters in the experiments named as numbers from 1 to 15.
sce <- SingleCellExperiment(assays = list(counts = as.matrix(combined@assays$RNA@counts),
logcounts = as.matrix(combined@assays$RNA@data)),
colData = combined@meta.data)
rowData(sce)$feature_symbol <- rownames(sce)
sce <- selectFeatures(sce, suppress_plot = FALSE)
table(rowData(sce)$scmap_features)
sce <- indexCluster(sce, cluster_col = "numeric_clusters")
head(metadata(sce)$scmap_cluster_index)
heatmap(as.matrix(metadata(sce)$scmap_cluster_index))
Setup Reference dataset
I prepared Immgen microarray data by using RMA normalization for microarrays previously and saved as immgen.rds
. In this dataset I collapsed the biological replicates (in most of the cells there are 3 replicates) to one datapoint by taking average after normalizations. This is a data frame containing normalized and log2 transformed expression values. I'm reading this data frame as the reference for scmap. immannot
provides long names and metadata for the individual samples.
immgen <- readRDS("immgen.rds")
immannot <- readRDS("immannot.rds")
ref_sce <- SingleCellExperiment(assay=list(logcounts = immgen[, 2:dim(immgen)[2]]),
colData = immannot)
rownames(ref_sce) <-immgen$GeneName
rowData(ref_sce)$feature_symbol <- immgen$GeneName
Map my own unknown data to immgen reference
scmapCluster_results <- scmapCluster(
projection = ref_sce,
index_list = list(
oconnell = metadata(sce)$scmap_cluster_index),
threshold = 0 # I'm keeping this very low to see even the weekly associated cell types
)
# Prepare and informative and easy-to-explore data frame of the results
res_df <- data.frame(label = as.character(scmapCluster_results$scmap_cluster_labs),
similarity = as.character(scmapCluster_results$scmap_cluster_siml),
immgen_short = immannot$short_name,
immgen_type= immannot$reference_cell_type)
head(res_df)
## label similarity immgen_short immgen_type
##1 11 0.871871179824524 MF.11c-11b+.Lu Macrophage
##2 8 0.491127572913492 MF.Alv.Lu Macrophage
##3 9 0.666813607696411 Mo.6+2+.BL Monocyte
##4 9 0.591040179641838 Mo.6+2+.MLN Monocyte
##5 7 0.882301399545328 Mo.6+2+.SLN Monocyte
##6 9 0.667127563564599 Mo.6+2-.BL Monocyte
Can you see anything wrong here? Is it a problem to project between scRNAseq data and bulk reference?
Best, Atakan
Y'know, SingleR has a
ImmGenData()
function that would save you some trouble.Hi Aaron,
Thanks for the info. I know about that function and I have used that already with SingleR. The reason I was asking this was a bit different. I wrote a program of my own to help with cluster annotations and I would like to compare its performance with
SingleR
andscmap
. I would like to usescmap
with the identical input that my software uses (immgen microarray data processed the same way).