Aberrant number of cells
0
0
Entering edit mode
@d9ae38d8
Last seen 3 months ago
Belgium

Hi !

I've tried to compare the output from Cellranger 3.0 with DropletUtils::emptyDrops() in term of number of cell-containing droplets. The web-summary from the Cellranger run outputs 1240 cells which is indeed very consistant when looking at the UMI Vs ranked-barcodes plot (the drop of the curve is around the 1100-1300th barcode). It is far less than I expected with regard to the number of cells I've put as input but this could be explain by some technical reasons I guess (wrong cell counting, bad cell lysis).

However when running DropletUtils::emptyDrops() on the unfiltered Cellranger matrix.h5, I obtain very high number of cell-containing barcodes (15307 !!). This is with default parameters but I've tried to adjust and it doesn't change anything (see following code). How comes that DropletUtils::emptyDrops() outputs so different numbers than cell ranger 3.0? What should I do?

Thanks a lot !


sce
#CASE 1 use emptyDrops with default parameters : it will retain every cells above knee of UMI Vs ranked-barcode plot
counts <- counts(sce)
tic()
set.seed(100)
e.out.1 <- DropletUtils::emptyDrops(counts,BPPARAM = SerialParam())
summary(e.out.1$FDR < 0.001) sum(e.out.1$FDR <= 0.001, na.rm=TRUE)
toc()

# RESULT = 15307

#CASE 2 use emptyDrops with retain=inf: it will retain every cells above inflection point of UMI Vs ranked-barcode plot
tic()
set.seed(100)
e.out.2 <- DropletUtils::emptyDrops(counts, retain = Inf)
summary(e.out.2$FDR < 0.001) sum(e.out.2$FDR <= 0.001, na.rm=TRUE)
toc()

# RESULT = 15307

#CASE 3 use emptyDrops on counts matrix without ribosomal or mitochondrial genes that could be captures in non-cell-containing drops
qc <- list(Mito = grep("^MT-",rowData(sce)$Symbol), Ribo = grep("^RPS",rowData(sce)$Symbol)
)
keep <- !row.names(counts) %in% unlist(qc)

tic()
set.seed(100)
e.out.3 <- DropletUtils::emptyDrops(counts[keep, ])
summary(e.out.3$FDR < 0.001) sum(e.out.3$FDR <= 0.001, na.rm=TRUE)
toc()

# RESULT = 15307

#CASE 4 use emptyDrops with lower =200 : in case the knee is artificially put in small bump of the curve at high ranked-barcode position
tic()
set.seed(100)
e.out.4 <- DropletUtils::emptyDrops(counts,lower =200)
summary(e.out.4$FDR < 0.001) sum(e.out.4$FDR <= 0.001, na.rm=TRUE)
toc()

# RESULT = 5960

0
Entering edit mode

I think emptyDrops function will always retain more cells as compared to the cell ranger pipeline as shown here. You can check using Seurat QC functions if the additional cells that are being rescued are actual cells or doublets. If you want to get similar number of cells you can use defaultDrops function of DropletUtils package as mentioned here. But using defaultDrops will kill the entire purpose of using raw matrices and performing custom filtering.

Aron is a wonderful developer and he always write functions which require less user intervention in function arguments. So stick to default if you are not sure what you are doing. Besides, the aforementioned link states:

Users wanting to use the cell calling algorithm from the CellRanger pipeline can call defaultDrops() instead. This tends to be quite conservative as it often discards genuine cells with low RNA content (and thus low total counts). It also requires an estimate of the expected number of cells in the experiment.

I use the following function for most of my datasets:

empty_filt <- function(sceobj){
e.out <- emptyDrops(counts(sceobj),retain = Inf)
sceobj <- sceobj[,which(e.out\$FDR <= 0.01)]
return(sceobj)}


If you have data where the low number of genes are expressed (like single cells at ring stage of Plasmodium) altering the lower parameter would be necessary. However, usually the default threshold works pretty well!! I see that you are filtering a lot of cells using the lower argument, and I would like to point you toward this extensive discussion on the effect of usage of this parameter here