Question

SingleR analysis with just a subset of Immgen data

0

Entering edit mode

dangm ▴ 10

@dangm-23230

Last seen 5.4 years ago

Figure 3a of Aran et. al Nature Immunology paper describing SingleR used only a subset of Immgen data that is specifically of macrophages and monocytes. How can I analyze my data using just that subset?

annotation • 2.2k views

ADD COMMENT • link updated 5.6 years ago by Aaron Lun ★ 29k • written 5.6 years ago by dangm ▴ 10

score 0 · Answer 1 · 2020-04-01

0

Entering edit mode

Aaron Lun ★ 29k

@alun

Last seen 2 hours ago

The city by the bay

The output of ImmGenData() is a SummarizedExperiment object, so you can just subset it to your desired labels before passing that into SingleR(). For example:

se <- ImmGenData()
se.sub <- se[,se$label.main %in% c("Monocytes", "Macrophages")]

And that's your new reference set.

ADD COMMENT • link 5.6 years ago Aaron Lun ★ 29k

0

Entering edit mode

Thank you. That worked well. Would you kindly also comment on whether or not the following is possible in SingleR? I used plotHeatmap(UnMgMo.sce, ordercolumnsby="labels", features=unique(unlist(all.markers$Microglia))) to display genes that are microglia specific and I get an extremely large list that is exceedingly difficult to read. Is there a way to limit this gene list for each cell type, say top 50 that are most different among my samples.

ADD REPLY • link 5.6 years ago dangm ▴ 10

0

Entering edit mode

If by "most different", you mean "most variable".

library(scran)
candidates <- unique(unlist(all.markers$Microglia))
hvg.out <- modelGeneVar(UnMgMo.sce)
to.use <- getTopHVGs(hvg.out[candidates,], n=50)

On the other hand, if you're talking about DE genes between labels:

de.out <- pairwiseWilcox(logcounts(UnMgMo.sce), UnMgMo.sce$label, 
    subset.row=candidates)

# Chose n=10 here because it's tricky to get exactly 50 genes in total
# for an unknown number of pairwise comparisons between labels.
to.use <- unlist(unlist(getTopMarkers(de.out[[1]], de.out[[2]], n=10)))

Note the unfortunate need for 2 unlists, this is a known bug in S4Vectors.

ADD REPLY • link 5.6 years ago Aaron Lun ★ 29k

0

Entering edit mode

Thank you again. For the first suggestion, by "most variable", do YOU mean, most variable between standard cell types and the second are genes that are most different between labels? If so, then I was looking for the first--basically genes that are most specific to each of the standard cell types.

ADD REPLY • link 5.6 years ago dangm ▴ 10

0

Entering edit mode

If you want to operate on the reference cell types, just swap UnMgMo.sce and UnMgMo.sce$label for se.sub and se.sub$label.main from my original post.

ADD REPLY • link 5.6 years ago Aaron Lun ★ 29k

0

Entering edit mode

Thank you. That didn't work for me. Does it matter that se.sub is a Large SummarizedExperiment file and not sce?

ADD REPLY • link 5.6 years ago dangm ▴ 10

0

Entering edit mode

It would be nice if you were a bit more specific about why it didn't work.

Fortunately, I can guess why. The current version of scran expects SingleCellExperiment objects but se.sub is a SummarizedExperiment object. No problem, just upgrade the object before using the functions.

sce.sub <- as(se.sub, "SingleCellExperiment")

The next version of scran will natively support both objects, so you won't need to do that.