Question

10X FeatureBarcode producing too many "cells"

0

Entering edit mode

msn ▴ 10

@9a987116

Last seen 3 months ago

Canada

I am analyzing some single cell human data that contains some featureBarcodes (10X's CITE-seq) for a few cell surface anti-bodies + 4 hashtags. Pipleline goes like this 10,000 cells into the 10X --> NovaSeq --> bcl2fastq --> cellranger count

When I import the raw seq data and the seq+featureBarcode data into SingleCellExperiment in R I get the same amount of "cells" 737,280 for both. BUT when I remove the emptydroplets on the seq data I get ~6000 remaining... a perfectly reasonable return for the experiment. BUT when I remove the emptydroplets on the featureBarcode+seq data I get 18,590 way more than expected...

with the edition of only 12 more features? (8 cell surface markers, 4 hashtags), what is making such a big difference?

as a solution do you think its fair to just subset the 12 features out, do drop outs without it, then put the data back in for the remaining "cells" to continue downstream analysis? or should I try to track down where the over 10,000 extra cells are coming from?

SingleCellExperiment DropletUtils SingleCellData • 1.8k views

ADD COMMENT • link updated 3.3 years ago by Aaron Lun ★ 28k • written 3.3 years ago by msn ▴ 10

score 1 · Answer 1 · 2021-01-08

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 12 hours ago

The city by the bay

tl;dr just use the RNA counts.

This is consistent with other observations; the ADT/HTO counts do not behave similarly to the RNA counts in the statistical model used by emptyDrops. I speculate that this is because protein aggregates increase the variability of the ADT counts compared to the per-molecule sampling that you get for RNA data. When you try to model both sets of features together, the variability of the ADT counts is understated, causing the algorithm to call an excessive number of cells.

That said, you _can_ use the ADT counts for calling cells. Just don't use them together with the RNA counts, because the two sets of features have different variabilities - see comments here.

ADD COMMENT • link 3.3 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks Aaron for taking the time to reply, glad to know this approach is the right way to go.

final off topic question for you: where do you go if you want a second opinion on an approach you are taking? is there a slack or discord group for sequencing work? IRC? or do you stick to the Bioconductor forums (which i guess are mirrored on the biostars or other way around)?