Question

Normalization and Quality Control for Multiple scRNAseq Data Sets

0

Entering edit mode

lmm278 • 0

@lmm278-17770

Last seen 5.5 years ago

Hello,

I am trying to analyze scRNAseq data from multiple 10X Genomics Chromium samples that I sequenced in different runs. My data is from injured tissue that contains many different cell types and I'd like to be able to compare relative gene expression both within and between cell types over time (I have one sample from 5 different time points all sequenced separately). I am new to working with large data sets, and I am trying to implement the proper quality control measures, as well as proper normalization procedures before clustering. I have a rough understanding of the potential issues with comparing data sets sequenced separately. I am wondering: Are the Batch Effect and the Dropout Effect the only issues I need to control for with data sets from different sequencing runs? If so, does anyone have suggestions on the best ways to go about this (preferably in a program such as R)? Also, is it acceptable to use housekeeping gene expression as a way to normalize for different runs, which is done with many qpcr analyses?

Thank you! (and sorry for the many questions)

normalization rnaseq genetics batch effect dropout • 1.0k views

ADD COMMENT • link updated 5.5 years ago by Steve Lianoglou ★ 13k • written 5.5 years ago by lmm278 • 0

score 0 · Answer 1 · 2018-10-12

There is a great series of articles written by Aaron Lun, Davis McCarthy, and John Marioni that outline various aspects of working with single cell data in the bioconductor-verse, I will link to both the "release" and "development" version of these articles.

If I were you I'd focus on the devel stuff. These will soon become the "release" version of bioconductor (in about or month or so, I think). Although there is a bit more work involved to get a "devel" environment working, for your particular problem (and release timing of devel), I think it will be quite worth your time.

Although all of these articles will be relevant to you as you come up to speed with analyzing single cell data, the "Correcting batch effects" one is of particular importance to your direct question.

Good luck!