Question

When to combine samples in the pre-processing of 10x scRNA-seq data?

2

Entering edit mode

jma1991 ▴ 70

@jma1991-11856

Last seen 12 months ago

Cumbernauld

I am following the Bioconductor simpleSingleCell workflow for droplet-based data and have a question regarding pre-processing. I have 10x scRNA-seq data from multiple samples. These were prepared in different wells on the same Chromium chip and ran in the same lane of a single flowcell using the HiSeq 4000 sequencing machine. I ultimately want to perform comparisons between the samples, however I'm not sure at what stage of pre-processing the samples should be combined. In particular, the RNA content and activity of cells between samples may differ markedly so I assume empty droplet detection step should be performed independently? Given cells from different samples are physically separated on the Chromium chip I also assume doublet detection should be performed independently?

My proposed workflow would be the following:

Remove barcode swapping (All samples)
Remove empty droplets (Per sample)
Calculate QC metrics (Per sample)
Remove low quality cells (Per sample)
Assign cell cycle phases (Per sample)
Remove zero count genes (Per sample, may cause problems later)
Normalization for cell-specific biases (Per sample)
Modelling the mean-variance trend (Per sample)
Dimensionality reduction (Per sample)
Clustering (Per sample)
Remove doublets detected by clusters / by simulation (Per sample)
Combine raw count matrices from all remaining cells across samples
Go back to the normalization step (7) and process all samples together

Does this seem reasonable, or am I over-complicating the pre-processing steps?

10x scrna-seq • 2.3k views

ADD COMMENT • link updated 5.0 years ago by Aaron Lun ★ 28k • written 5.0 years ago by jma1991 ▴ 70

score 2 · Answer 1 · 2019-04-30

If you kept on following the workflows, you will see that there is one for batch correction, so I'm not going to repeat those recommendations here.

I will, however, make some specific comments:

In particular, the RNA content and activity of cells between samples may differ markedly so I assume empty droplet detection step should be performed independently?

Yes.

Given cells from different samples are physically separated on the Chromium chip I also assume doublet detection should be performed independently?

Yes.

Clustering (Per sample)

This is usually a good idea for sanity checking purposes. But for your primary analysis, you really want to have common clusters across all samples. This simplifies comparisons between samples, especially in the case where your clusters are not distinctly defined. (Try figuring out which clusters match up in early embryonic development!) And even if they are well-defined, you don't want to have to manually annotate every cell type X times for X samples. Just put in the hard work once for the common clusters.