Question: emptyDrops identified less non-empty cells with lower total UMI counts?
0
11 weeks ago by
C T90
United States
C T90 wrote:

Hello,

I hope someone can help me understand this better. I am running the emptyDrops method from DropletUtils library which distinguishes empty from non-empty cells. The function accepts lower parameter which specify the lower bound on the total UMI count, at or below which all barcodes are assumed to correspond to empty droplets. Link to manual.

Based on this, I thought that if I specified small number for lower, it will detect more non-empty cells and if used bigger number for lower, it will detect less non-empty cells. However, it didn't. In fact it is consistently detect more non-empty cells when lower is set to larger number. Am I missing something? Here is the code to reproduce the results.

library(DropletsUtils)
library(BiocFileCache)
bfc <- BiocFileCache("raw_data", ask = FALSE)
raw.path <- bfcrpath(bfc, file.path("http://cf.10xgenomics.com/samples",
"cell-exp/2.1.0/pbmc4k/pbmc4k_raw_gene_bc_matrices.tar.gz"))
untar(raw.path, exdir=file.path(tempdir(), "pbmc4k"))
library(DropletUtils)
fname <- file.path(tempdir(), "pbmc4k/raw_gene_bc_matrices/GRCh38")

set.seed(100)
e.out <- emptyDrops(counts(sce),lower=100)
sum(e.out$FDR <= 0.001, na.rm=TRUE) [1] 4237 set.seed(100) e.out.200 <- emptyDrops(counts(sce),lower=200) sum(e.out.200$FDR <= 0.001, na.rm=TRUE)
[1] 4346

set.seed(100)
e.out.1000 <- emptyDrops(counts(sce),lower=1000)
sum(e.out.1000$FDR <= 0.001, na.rm=TRUE) [1] 4326 set.seed(123) e.out.test <- emptyDrops(counts(sce),lower=100) sum(e.out.test$FDR <= 0.001, na.rm=TRUE)
[1] 4223

set.seed(123)
e.out.test <- emptyDrops(counts(sce),lower=200)
sum(e.out.test$FDR <= 0.001, na.rm=TRUE) [1] 4316 set.seed(123) e.out.test <- emptyDrops(counts(sce),lower=1000) sum(e.out.test$FDR <= 0.001, na.rm=TRUE)
[1] 4324

modified 11 weeks ago by Aaron Lun25k • written 11 weeks ago by C T90
Answer: emptyDrops identified less non-empty cells with lower total UMI counts?
2
11 weeks ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

lower is used to estimate the "expression" profile of the ambient pool; changing lower just means that you're using more-or-less low-count libraries to do this estimation. Provided all of the low count libraries are empty droplets, the ambient proportion estimates should not change much from using more libraries. The fact that the number of detected cells is stable is actually encouraging as it means that the analysis is robust to the choice of lower, i.e., exactly how empty droplets are defined doesn't matter much.

A secondary effect is that libraries with counts below lower are not tested by default, as we're already assuming that they're empty droplets (so why bother testing them?). This means that there is a small reduction in the multiple testing burden as you increase lower. One would expect this to result in a slight increase in the number of detected cells, though this really depends on whether you start throwing away more cells with total counts below lower, and in this case, it all comes out in the wash.

P.S. I'll give you the benefit of the doubt and assume you're not the same person who posted this, but if you are, it is usually good etiquette to wait a reasonable timeframe before reposting.

Hi Aaron,

Thank you very much for your clear explanation. I have one more related question: in your f1000research paper, it didn't have cell calling step. It makes me wonder whether there are specific cases where you don't need to do cell calling. Thank you!

1

Well, y'know, back in my day, we didn't have fancy droplet methods for scRNA-seq. We had microwell plates. Plates, and a pipette, for the entire lab. And we had to share the pipette!

More seriously, all of the data in the paper was taken from plate-based methods (or close to it, e.g., C1), where we were pretty sure of getting a cell from each library. So there wasn't any need for an explicit cell calling step, though obviously quality control was still required to protect against damaged cells or cell fragments. You might get an occasional accidentally empty well, but that was very much a minority and would be caught by QC using outliers. By comparison, in droplet-based methods, the empty droplets are the majority so an outlier-based QC method would fail.