How can doubletCell() scores be converted to single/doublet annotation?
1
0
Entering edit mode
thkapell ▴ 10
@tkapell-14647
Last seen 2 days ago
Helmholtz Center Munich, Germany

Hi all,

I used scran::doubletCells to predict doublets in a single-cell experiment. I converted my Seurat object to a SingleCellExperiment one and ran:

scores <- doubletCells(sce)


The output is a numeric vector with doublet scores for each cell and I was wondering how the cells are annotated. Do I have to set a hard score threshold or should I apply a x-percentile cut-off to discriminate singlets from doublets?

Additionally, reading the Detecting doublet cells with scran vignette, I had a couple of questions. My sce object contains both raw and normalised count tables, but when I run:

> sizeFactors(sce)
NULL


I suppose that the normalisation is run, but I do not know where the information is. In addition, does anyone have any experience with running the force matching argument?

Thanks, Theo

scran doublet • 444 views
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 15 hours ago
The city by the bay

The output is a numeric vector with doublet scores for each cell and I was wondering how the cells are annotated. Do I have to set a hard score threshold or should I apply a x-percentile cut-off to discriminate singlets from doublets?

I would consider looking at outliers with scater::isOutlier and type="higher". You'll have to choose a nmad but that's no different to having to choose one of any arbitrary thresholds. I'd probably pick 3.

A hard score threshold would be very difficult to pick, the values have little absolute meaning. A percentile cut-off might be passable if you have an idea of the doublet rate in your experiment. However, I suspect it's not just a function of the number of cells, but rather also the relative stickiness of your population.

I suppose that the normalisation is run, but I do not know where the information is.

I don't really know what you mean. Are you looking for the normalized values? That's in the assays(sce) under whatever name you called them. Normally if you run logNormCounts, the sizeFactors field is populated with library size-derived factors (assuming you didn't already have something there), and "logcounts" assay is created. If you did your normalization via some other way, then I wouldn't know what happened.

In addition, does anyone have any experience with running the force matching argument?

I wrote it but I actually have little practical experience with it. You can have a look at my thoughts on the theoretical side here, if you haven't already. I don't put a lot of faith in the force matching; or indeed in the entire function; or indeed, in the entire class of functions in the field that rely on simulated doublets. The fundamental problems is that we don't have a good idea of the relative total RNA content of each cell, which means that we're really just hoping that our simulated doublets are good-enough proxies for the real thing.

And I'm not even talking about situations where doublets are formed by sticky cells where the adhesion induces transcriptional changes (e.g., immune synapses). Though at least there's some interesting biology there.

0
Entering edit mode

Thanks a lot Aaron. For the second point, I meant that I have the raw and normalised counts in assays(sce) saved from the Seurat object, but the sizeFactors space is empty. In the end, doubletCells() runs fine, but I am not sure whether I should run logNormCounts on top to populate the sizeFactors slot. Would it make a difference or are the sizeFactors calculated internally by the doubletCells() function anyway?

0
Entering edit mode

I'm pretty sure doubletCells() ignores the normalized values, it just uses the raw counts.