10X FeatureBarcode producing too many "cells"
Entering edit mode
msn • 0
Last seen 2 days ago

I am analyzing some single cell human data that contains some featureBarcodes (10X's CITE-seq) for a few cell surface anti-bodies + 4 hashtags. Pipleline goes like this 10,000 cells into the 10X --> NovaSeq --> bcl2fastq --> cellranger count

When I import the raw seq data and the seq+featureBarcode data into SingleCellExperiment in R I get the same amount of "cells" 737,280 for both. BUT when I remove the emptydroplets on the seq data I get ~6000 remaining... a perfectly reasonable return for the experiment. BUT when I remove the emptydroplets on the featureBarcode+seq data I get 18,590 way more than expected...

with the edition of only 12 more features? (8 cell surface markers, 4 hashtags), what is making such a big difference?

as a solution do you think its fair to just subset the 12 features out, do drop outs without it, then put the data back in for the remaining "cells" to continue downstream analysis? or should I try to track down where the over 10,000 extra cells are coming from?

SingleCellExperiment DropletUtils SingleCellData • 85 views
Entering edit mode
Aaron Lun ♦ 26k
Last seen 41 minutes ago
The city by the bay

tl;dr just use the RNA counts.

This is consistent with other observations; the ADT/HTO counts do not behave similarly to the RNA counts in the statistical model used by emptyDrops. I speculate that this is because protein aggregates increase the variability of the ADT counts compared to the per-molecule sampling that you get for RNA data. When you try to model both sets of features together, the variability of the ADT counts is understated, causing the algorithm to call an excessive number of cells.

That said, you _can_ use the ADT counts for calling cells. Just don't use them together with the RNA counts, because the two sets of features have different variabilities - see comments here.

Entering edit mode

Thanks Aaron for taking the time to reply, glad to know this approach is the right way to go.

final off topic question for you: where do you go if you want a second opinion on an approach you are taking? is there a slack or discord group for sequencing work? IRC? or do you stick to the Bioconductor forums (which i guess are mirrored on the biostars or other way around)?

Entering edit mode

There's a Bioconductor slack group that you can sign up for here.


Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3