Question: Recommendations for combining multiple 10x runs into one SingleCellExperiment
0
gravatar for Peter Hickey
13 months ago by
Peter Hickey460
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Peter Hickey460 wrote:

I've got an experiment with eight 10x scRNA-seq runs that I'm analysing by starting with something based on the simpleSingleCell workflow. I constructed each SingleCellExperiment with DropletUtils::read10xCounts().

I'm looking for any opinions or advice on when to combine these into one SingleCellExperiment object vs. having, say, a list of SingleCellExperiment objects (one per run)? I've been tossing up between a few options:

  1. Pass all runs via the samples argument of DropletUtils::read10Counts() and adding run to the colData
  2. Filter each run (to remove empty drops) independently and then combine.
  3. Keep them separate until running something like scran::mnnCorrect() (which would seem to need a separate expression matrix for each run, anyway).

Thanks, Pete

ADD COMMENTlink modified 13 months ago by Aaron Lun25k • written 13 months ago by Peter Hickey460
Answer: Recommendations for combining multiple 10x runs into one SingleCellExperiment
2
gravatar for Aaron Lun
13 months ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

I'd process cells in each run separately up until the point that they need to be combined. This is actually necessary for some procedures - emptyDrops as the ambient pool will probably differ between runs; and doubletCells, as doublets can't form between runs. Processing them separately will also make things clearer with respect to quality control of individual samples, and just generally give you a more precise idea of what is present in each sample before you try to mush everything together into a single data set.

The only downside of processing them separately is that you cannot detect genes that are highly variable across samples. If some samples are from different conditions, the standard variance modelling within each sample will not pick up the genes that are only DE between conditions. How much of this is a problem depends on your downstream applications. If you're going to batch correct across all samples anyway, then it doesn't matter as any DE genes would end up being wiped out by the batch correction.

You can avoid this with careful experimental design, e.g., paired WT/KO samples in each batch so that correction cannot remove genotype differences. You can also detect DE genes between conditions by summing cells within each batch (possibly per population) and treating them as pseudo-bulk for edgeR analyses (see https://doi.org/10.1093/biostatistics/kxw055). This complements a batch-corrected single-cell-level analysis, e.g., when a treatment causes both a systematic DE and changes in population composition.

Also, make sure you read the devel version of the batch correction workflow, which is quite a bit more performant than mnnCorrect.

ADD COMMENTlink modified 13 months ago by Steve Lianoglou12k • written 13 months ago by Aaron Lun25k

Thanks, Aaron. Very helpful (as always).

ADD REPLYlink written 13 months ago by Peter Hickey460
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 417 users visited in the last hour