Question

How can doubletCell() scores be converted to single/doublet annotation?

0

Entering edit mode

thkapell ▴ 10

@tkapell-14647

Last seen 5 months ago

Helmholtz Center Munich, Germany

Hi all,

I used scran::doubletCells to predict doublets in a single-cell experiment. I converted my Seurat object to a SingleCellExperiment one and ran:

scores <- doubletCells(sce)

The output is a numeric vector with doublet scores for each cell and I was wondering how the cells are annotated. Do I have to set a hard score threshold or should I apply a x-percentile cut-off to discriminate singlets from doublets?

Additionally, reading the Detecting doublet cells with scran vignette, I had a couple of questions. My sce object contains both raw and normalised count tables, but when I run:

> sizeFactors(sce)
NULL

I suppose that the normalisation is run, but I do not know where the information is. In addition, does anyone have any experience with running the force matching argument?

Thanks, Theo

scran doublet • 1.7k views

ADD COMMENT • link updated 5.9 years ago by Aaron Lun ★ 29k • written 5.9 years ago by thkapell ▴ 10

score 0 · Answer 1 · 2020-03-26

The output is a numeric vector with doublet scores for each cell and I was wondering how the cells are annotated. Do I have to set a hard score threshold or should I apply a x-percentile cut-off to discriminate singlets from doublets?

I would consider looking at outliers with scater::isOutlier and type="higher". You'll have to choose a nmad but that's no different to having to choose one of any arbitrary thresholds. I'd probably pick 3.

A hard score threshold would be very difficult to pick, the values have little absolute meaning. A percentile cut-off might be passable if you have an idea of the doublet rate in your experiment. However, I suspect it's not just a function of the number of cells, but rather also the relative stickiness of your population.

I suppose that the normalisation is run, but I do not know where the information is.

I don't really know what you mean. Are you looking for the normalized values? That's in the assays(sce) under whatever name you called them. Normally if you run logNormCounts, the sizeFactors field is populated with library size-derived factors (assuming you didn't already have something there), and "logcounts" assay is created. If you did your normalization via some other way, then I wouldn't know what happened.

In addition, does anyone have any experience with running the force matching argument?

I wrote it but I actually have little practical experience with it. You can have a look at my thoughts on the theoretical side here, if you haven't already. I don't put a lot of faith in the force matching; or indeed in the entire function; or indeed, in the entire class of functions in the field that rely on simulated doublets. The fundamental problems is that we don't have a good idea of the relative total RNA content of each cell, which means that we're really just hoping that our simulated doublets are good-enough proxies for the real thing.

And I'm not even talking about situations where doublets are formed by sticky cells where the adhesion induces transcriptional changes (e.g., immune synapses). Though at least there's some interesting biology there.