Dear all, Laleh & Aaron,
I'm hoping someone can point to the best way to handle the data that I have.
The experimental design is as follows. Our study has individuals from three distinct conditions say control, condition1 and condition2. Cells from each individual are sorted into two related types say A and B. These samples are sequenced using 10X in the same batch/run. For practical reasons we cannot assay cells from different individuals in the same batch.
Cell type A is of greater biological interest. In particular we are interested in transcriptional changes in cells of type A between the control individuals and either condition.
The numbers of cells in each cell type sample varies from ~500 to ~10,000. The median reads per cell varies from ~30 up to ~3,000. Cell type B is more temperamental than cell type A resulting in lower quality data. This is a pilot study so currently we only have data from 7 individuals (~15,000 cells in total).
This seems to fit the rationale for MnnCorrect fairly well in that we wish to correct for batch effects when the experimental design confounds batches with individuals. However if I run MnnCorrect on all the batches together, I will likely remove the effect that we are looking for. Is my best bet to run MnnCorrect separately on the individuals belonging to each condition? What is the best way to make use of the cell sorting to help with the batch correction?
I don't have experience with data of such low coverage. Most/all of the single cell methods seem to be designed for data with much higher coverage. MnnCorrect doesn't seem to assume high depth data.
As the median reads per cell vary so much across samples. I do not wish to apply standard QC metrics to the data as a whole. Would you recommend applying standard QC procedures to each batch separately? At the moment I am just doing very simple QC to the data as a whole: removing cells with a high percentage of mitochondrial reads and filtering the genes down to the 10% most variable. I'm not using any thresholds for number of genes detected or total read counts.
Thanks for taking the time to read this and for a very nice publication on MnnCorrect. Any thoughts would be very welcome.