Question about using sorted cell bulk microarray data (ImmGen) as reference
Entering edit mode
atakanekiz ▴ 30
Last seen 23 days ago

(cross-posted from GitHub issues page)


Thanks for this helpful package. I have a question about mapping my scRNAseq data to the Immgen microarray data as the reference. As you probably already know, Immgen contains microarray data from hundreds of sorted immune cells from mice. I was able to get something to work, but I'd like to double-check to make sure I'm using this appropriately. Please see my code below:

Setup scRNAseq data

combined is a Seurat object containing 4 separate 10x runs aggregated. There are 15 clusters in the experiments named as numbers from 1 to 15.

sce <- SingleCellExperiment(assays = list(counts = as.matrix(combined@assays$RNA@counts),
                                          logcounts = as.matrix(combined@assays$RNA@data)), 
                            colData =

rowData(sce)$feature_symbol <- rownames(sce)

sce <- selectFeatures(sce, suppress_plot = FALSE)


sce <- indexCluster(sce, cluster_col = "numeric_clusters")



Setup Reference dataset

I prepared Immgen microarray data by using RMA normalization for microarrays previously and saved as immgen.rds. In this dataset I collapsed the biological replicates (in most of the cells there are 3 replicates) to one datapoint by taking average after normalizations. This is a data frame containing normalized and log2 transformed expression values. I'm reading this data frame as the reference for scmap. immannot provides long names and metadata for the individual samples.

immgen <- readRDS("immgen.rds")
immannot <- readRDS("immannot.rds")

ref_sce <- SingleCellExperiment(assay=list(logcounts = immgen[, 2:dim(immgen)[2]]),
                                colData = immannot)

rownames(ref_sce) <-immgen$GeneName
rowData(ref_sce)$feature_symbol <- immgen$GeneName

Map my own unknown data to immgen reference

scmapCluster_results <- scmapCluster(
  projection = ref_sce, 
  index_list = list(
  oconnell = metadata(sce)$scmap_cluster_index),
  threshold = 0     # I'm keeping this very low to see even the weekly associated cell types

# Prepare and informative and easy-to-explore data frame of the results
res_df <- data.frame(label = as.character(scmapCluster_results$scmap_cluster_labs),
                     similarity = as.character(scmapCluster_results$scmap_cluster_siml),
                     immgen_short = immannot$short_name,
                     immgen_type= immannot$reference_cell_type)


##  label        similarity   immgen_short immgen_type
##1    11 0.871871179824524 MF.11c-11b+.Lu  Macrophage
##2     8 0.491127572913492      MF.Alv.Lu  Macrophage
##3     9 0.666813607696411     Mo.6+2+.BL    Monocyte
##4     9 0.591040179641838    Mo.6+2+.MLN    Monocyte
##5     7 0.882301399545328    Mo.6+2+.SLN    Monocyte
##6     9 0.667127563564599     Mo.6+2-.BL    Monocyte

Can you see anything wrong here? Is it a problem to project between scRNAseq data and bulk reference?

Best, Atakan

scmap • 868 views
Entering edit mode

Y'know, SingleR has a ImmGenData() function that would save you some trouble.

Entering edit mode

Hi Aaron,

Thanks for the info. I know about that function and I have used that already with SingleR. The reason I was asking this was a bit different. I wrote a program of my own to help with cluster annotations and I would like to compare its performance with SingleR and scmap. I would like to use scmap with the identical input that my software uses (immgen microarray data processed the same way).

Entering edit mode
atakanekiz ▴ 30
Last seen 23 days ago

Copying the response I got from Dr. Martin Hemberg (the author of the package):


we have not tested scmap for mapping to bulk references, so I have no idea how well it will perform. My guess, however, is that it will not work very well as the gene selection step (which is critical for good results) is based on the dropout characteristic of scRNA-seq data. As this property is not a major feature of bulk data, I am concerned that the algorithm will end up making poor choices with regards to what genes to use (and consequently poor mapping). At the end of the day though the proof is in the pudding, so if you end up getting sensible results then you should go with the approach that you have outlined."


Login before adding your answer.

Traffic: 366 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6