I am analyzing some single cell human data that contains some featureBarcodes (10X's CITE-seq) for a few cell surface anti-bodies + 4 hashtags. Pipleline goes like this 10,000 cells into the 10X --> NovaSeq --> bcl2fastq --> cellranger count
When I import the raw seq data and the seq+featureBarcode data into SingleCellExperiment in R I get the same amount of "cells" 737,280 for both. BUT when I remove the emptydroplets on the seq data I get ~6000 remaining... a perfectly reasonable return for the experiment. BUT when I remove the emptydroplets on the featureBarcode+seq data I get 18,590 way more than expected...
with the edition of only 12 more features? (8 cell surface markers, 4 hashtags), what is making such a big difference?
as a solution do you think its fair to just subset the 12 features out, do drop outs without it, then put the data back in for the remaining "cells" to continue downstream analysis? or should I try to track down where the over 10,000 extra cells are coming from?