Hi,

I have been using singleR to predict cell types of my samples. I use the reference dataset from HumanPrimaryCellAtlasData and used the following code:

hpca.se <- HumanPrimaryCellAtlasData()

out <- pairwiseWilcoxhpca.se@assays@data$logcounts, hpca.se$label.main, direction="up")
markers <- getTopMarkers(out$statistics, out$pairs, n=10)


when I try to check for the marker genes of each cell type by using, for example,

markers$Neurons$B-cells


I get

character(0)


And it is the same for all pairs of cell types in the reference dataset.

Any suggestions? Thanks!

Somebody else may answer but what is the output of str(markers$Neurons) and / or even just str(markers)? ADD REPLY 0 Entering edit mode Aaron Lun ★ 26k @alun Last seen 54 minutes ago The city by the bay Works for me (after fixing your syntax errors): library(SingleR) hpca.se <- HumanPrimaryCellAtlasData() # Note that it is poor practice to use '@' in analysis code. Use # assay(hpca, "logcounts") instead, or even simpler: library(scran) out <- pairwiseWilcox( hpca.se, hpca.se$label.main, direction="up")

markers <- getTopMarkers(out$statistics, out$pairs, n=10)

markers$Neurons$B_cell
##  [1] "ANK2"     "ARHGEF40" "CDH2"     "EFR3B"    "FAM168A"  "HEY1"
##  [7] "INTU"     "KIF21A"   "LRP11"    "MBOAT2"


I should add that, in my opinion, there's not much point in using the Wilcoxon rank sum test for the HPCA data; this is a bulk microarray reference and there's not enough samples to give you a fine-grained ordering of candidate markers. For example, I reckon if you looked inside out\$statistics, you would find many of the top genes stuck on the same p-value because there's just not enough permutations of ranks to distinguish them. You'll just end up with an arbitrary choice of the top n=10 in such cases - better to use pairwiseTTests(), which is more responsive to the effect size.

Thank you very much!!!