Non-empty droplets versus good quality cells
Entering edit mode
rocanja ▴ 60
Last seen 4 months ago

The emptyDrops algorithm has been designed to identify cell populations with low RNA content which otherwise may be missed, especially in very heterogeneous samples. However, a known side-effect is that a higher number of low-quality cells may be recovered eg. droplets containing cell fragments, stripped nuclei and damaged cells, as they will still be significantly non-empty. To distinguish these low-quality cells from 'proper' low-RNA-content cell types, a recommendation in the emptyDrops help pages is to draw an MA plot comparing the average expression between retained low-count barcodes and discarded barcodes to see which genes are driving the differences - with mitochondrial genes and genes coding for ribosomal subunits usually indicating low-quality.

1) I am wondering whether anyone has considered using the percentage of intronic reads as an alternative metric to differentiate between both scenarios? If in damaged cells the cytopasmic RNA has leaked out, the remaining RNA species should be enriched in nuclear 'unspliced' pre-pro-RNA molecules, which should exhibit a larger proportion of intronic reads? On the other hand, 'proper' low-RNA-content cell types should have a 'normal' proportion of intronic reads, as the majority of the RNA should be mature / spliced cytoplasmic transcripts? Unfortunately, the CellRanger 'metrics-summary' only gives the percentage of intronic reads on a per sample basis and not per cell. I there a tool out there that calculates the percentage of intronic reads per cell?

2) Alternatively, I was thinking of using the distribution of gene expression - leaky low-quality cells should have an overall lower gene expression, but since the leakiness is random / unspecific, those cells would still express a large 'range' of genes but each cell with a slightly different subset, leading to a noisy distribution across the population. On the other hand, 'proper' low-RNA-content cell types eg terminally differentiated cells, should have a very distinct cell-type specific gene expression pattern - meaning they would only express a small subset of genes, but those on a decent level and the distribution across the population should be more homogeneous. Any thoughts on this approach?

CellRanger emptyDrops scRNAseq 10X • 348 views
Entering edit mode

Re 1) I have found this package which seems to do exactly that: Muskovic et al 2021 - DropletQC: improved identification of empty droplets and damaged cells in single-cell RNA-seq data - This approach is based on a novel quality control metric, the nuclear fraction, which quantifies for each droplet the fraction of RNA originating from unspliced, nuclear pre-mRNA. - DropletQC GitHub

Entering edit mode
Last seen 12 hours ago
United States

General musings about how to do stuff should go on This support site is meant for questions specifically about Bioconductor packages.

Login before adding your answer.

Traffic: 969 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6