When to combine samples in the pre-processing of 10x scRNA-seq data?
1
2
Entering edit mode
jma1991 ▴ 70
@jma1991-11856
Last seen 12 months ago
Cumbernauld

I am following the Bioconductor simpleSingleCell workflow for droplet-based data and have a question regarding pre-processing. I have 10x scRNA-seq data from multiple samples. These were prepared in different wells on the same Chromium chip and ran in the same lane of a single flowcell using the HiSeq 4000 sequencing machine. I ultimately want to perform comparisons between the samples, however I'm not sure at what stage of pre-processing the samples should be combined. In particular, the RNA content and activity of cells between samples may differ markedly so I assume empty droplet detection step should be performed independently? Given cells from different samples are physically separated on the Chromium chip I also assume doublet detection should be performed independently?

My proposed workflow would be the following:

  1. Remove barcode swapping (All samples)
  2. Remove empty droplets (Per sample)
  3. Calculate QC metrics (Per sample)
  4. Remove low quality cells (Per sample)
  5. Assign cell cycle phases (Per sample)
  6. Remove zero count genes (Per sample, may cause problems later)
  7. Normalization for cell-specific biases (Per sample)
  8. Modelling the mean-variance trend (Per sample)
  9. Dimensionality reduction (Per sample)
  10. Clustering (Per sample)
  11. Remove doublets detected by clusters / by simulation (Per sample)
  12. Combine raw count matrices from all remaining cells across samples
  13. Go back to the normalization step (7) and process all samples together

Does this seem reasonable, or am I over-complicating the pre-processing steps?

10x scrna-seq • 2.3k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 1 hour ago
The city by the bay

If you kept on following the workflows, you will see that there is one for batch correction, so I'm not going to repeat those recommendations here.

I will, however, make some specific comments:

In particular, the RNA content and activity of cells between samples may differ markedly so I assume empty droplet detection step should be performed independently?

Yes.

Given cells from different samples are physically separated on the Chromium chip I also assume doublet detection should be performed independently?

Yes.

Clustering (Per sample)

This is usually a good idea for sanity checking purposes. But for your primary analysis, you really want to have common clusters across all samples. This simplifies comparisons between samples, especially in the case where your clusters are not distinctly defined. (Try figuring out which clusters match up in early embryonic development!) And even if they are well-defined, you don't want to have to manually annotate every cell type X times for X samples. Just put in the hard work once for the common clusters.

ADD COMMENT
0
Entering edit mode

Thank you Aaron for confirming that empty droplet and doublet detection should be performed separately. However, I'm not sure why you suggested looking at the batch correction workflow? The samples were all prepared on the same chip (albeit in different channels) and sequenced on the same lane of a flow cell. I assumed batch correction was only suitable when for example the samples are sequenced on different dates or prepared by different labs. Additionally, two of the samples contain distinct cell types so MNN-based correction may not work? Apologies if I have misunderstood anything in your reply.

ADD REPLY

Login before adding your answer.

Traffic: 820 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6