Hi!
I am working on 10X Genomics scRNAseq data that includes 1 wildtype sample (=wt), 1 experimental sample that is similar to wildtype (=mut1), and 1 experimental sample (=mut2) in which about 1 third of cells ( ~ 3000) are very dissimilar to the two other samples (meaning that they form their own clusters when clustering the batch corrected, combined dataset).
In my analysis, I am following the Bioconductor OSCA workflow, in which each step is excellently explained, however, I am a little bit confused regarding the correct way to do normalization by deconvolution using scran's "computeSumFactors" function.
My question is whether I should use "computeSumFactors" for each sample separately, and then combine the individually normalized samples to a common single-cell-experiment, or whether I should normalize over all three samples together.
Initially, I thought that I should do it for each sample separately, because in the OSCA chapter about batch correction linkToOSCA they say " As a general rule, these upstream processing steps (quality control and normalization) should be done within each batch where possible."
On the other hand, in their workflows section, I see that normalization in a dataset with a similar experimental setting as my data is done across all samples using scran's "computeSumFactors" linkToOSCAworkflows_MessmerData. Furthermore, in a tutorial of scRNAseq created by an OSCA co-author normalization was again done across all samples, and not sample by sample linkToStephanieHicks_bioinfosummer_2018.
Please could anybody give me some guidance about the pros and cons of both strategies described above, also with respect to my dataset?
Many thanks and best wishes -
rg