Question: In which order should you detect empty droplets / remove barcode swapping using DropletUtils?
0
gravatar for jma1991
6 months ago by
jma199130
jma199130 wrote:

In which order should you detect empty droplets / remove barcode swapping? The DropletUtils vignette says you should use all barcodes for empty droplet detection so I assume this is the first step? Naively I thought that if the amount of barcode swapping was large then the counts for each barcode could be very different after the correction which would also affect empty droplet detection?

Specifically, I have 4 samples from 10x scRNA-seq which have all been sequenced together on the same lane of the Illumina 4000. My current workflow is to do the following:

  1. Detect empty droplets for each sample independently using the raw barcode matrix files (do not filter cells afterward)
  2. Detect barcode swapping amongst all samples using the molecule information files (the function returns a filtered matrix where column sums are not zero)
  3. Assign the counts from barcode swapping to the raw barcode matrix files

Would this be reasonable?

dropletutils • 147 views
ADD COMMENTlink modified 6 months ago by s14376430 • written 6 months ago by jma199130
Answer: In which order should you detect empty droplets / remove barcode swapping using
1
gravatar for Aaron Lun
6 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

The "all barcodes" part of the documentation is just telling you to not filter on the cell barcodes, i.e., don't call cells with some other method before using emptyDrops. It doesn't mean you have to keep all reads for a given barcode.

The correct approach is to treat the barcode swapping removal step as part of the pre-processing to get the count matrix in the first place. You should do this before any cell calling - because barcode swapping occurs regardless of whether or not you have cells! - and then use the de-swapped matrix for all downstream analysis.

Your current approach puts unnecessary pressure on emptyDrops, which would find it harder to make the right calls if you have a lot of swapping between samples. If you clean up the count matrix with swappedDrops first, the estimate of - well, everything - should be more accurate and improve all downstream analyses.

ADD COMMENTlink written 6 months ago by Aaron Lun25k

Okay, that's cleared it up for me! I was getting confused because swappedDrops was returning a matrix with less columns (i.e. barcodes) and thought this would interfere with the "all barcodes" bit written in the vignette for emptyDrops. I was then substituting the columns from the cleaned matrix back into the raw count matrix before using emptyDrops (I knew at that point I was probably doing something wrong). I'll just run swappedDrops and use the cleaned matrix output from the function as the raw count matrix. Many thanks, Aaron!

ADD REPLYlink written 6 months ago by s14376430
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 184 users visited in the last hour