Question: In which order should you detect empty droplets / remove barcode swapping using DropletUtils?
0
6 months ago by
jma199130
jma199130 wrote:

In which order should you detect empty droplets / remove barcode swapping? The DropletUtils vignette says you should use all barcodes for empty droplet detection so I assume this is the first step? Naively I thought that if the amount of barcode swapping was large then the counts for each barcode could be very different after the correction which would also affect empty droplet detection?

Specifically, I have 4 samples from 10x scRNA-seq which have all been sequenced together on the same lane of the Illumina 4000. My current workflow is to do the following:

1. Detect empty droplets for each sample independently using the raw barcode matrix files (do not filter cells afterward)
2. Detect barcode swapping amongst all samples using the molecule information files (the function returns a filtered matrix where column sums are not zero)
3. Assign the counts from barcode swapping to the raw barcode matrix files

Would this be reasonable?

dropletutils • 147 views
modified 6 months ago by s14376430 • written 6 months ago by jma199130
Answer: In which order should you detect empty droplets / remove barcode swapping using
1
6 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

The "all barcodes" part of the documentation is just telling you to not filter on the cell barcodes, i.e., don't call cells with some other method before using emptyDrops. It doesn't mean you have to keep all reads for a given barcode.

The correct approach is to treat the barcode swapping removal step as part of the pre-processing to get the count matrix in the first place. You should do this before any cell calling - because barcode swapping occurs regardless of whether or not you have cells! - and then use the de-swapped matrix for all downstream analysis.

Your current approach puts unnecessary pressure on emptyDrops, which would find it harder to make the right calls if you have a lot of swapping between samples. If you clean up the count matrix with swappedDrops first, the estimate of - well, everything - should be more accurate and improve all downstream analyses.