I am trying to analyze scRNAseq data from multiple 10X Genomics Chromium samples that I sequenced in different runs. My data is from injured tissue that contains many different cell types and I'd like to be able to compare relative gene expression both within and between cell types over time (I have one sample from 5 different time points all sequenced separately). I am new to working with large data sets, and I am trying to implement the proper quality control measures, as well as proper normalization procedures before clustering. I have a rough understanding of the potential issues with comparing data sets sequenced separately. I am wondering: Are the Batch Effect and the Dropout Effect the only issues I need to control for with data sets from different sequencing runs? If so, does anyone have suggestions on the best ways to go about this (preferably in a program such as R)? Also, is it acceptable to use housekeeping gene expression as a way to normalize for different runs, which is done with many qpcr analyses?
Thank you! (and sorry for the many questions)