I am using reducedMNN with NMF as input to perform batch correction. To better understand what is happening, I have tried going through the fastMNN theory that you have posted on github. However, I am getting confused with regard to a particular thing. At one place, it suggests that a correction vector is identified for a cell in the target batch which is an averaged correction vector from all MNN pairs of that cell with cells in the reference batch. The MNN pairs help identify local variation in subpopulation of the target batch. But then the batch vector, the component of correction vector that is actually removed, is same for all cells. Doesn't that mean that the locality of correction is disregarded and the batch effect is assumed to be same across all cells?
If that is correct that indeed the effect corrected is same across all sub-populations is same, is there a way to modify the approach to correct sub-populations individually? I tried breaking the target batch into new batches based on clusters obtained in the target batch and then running reducedMNN using them as independent batches of sample, but I am getting an overlap of the target batch (clusters), rather than different batches (clusters) integrating with separate populations in the reference. I am assuming this is happening because the subpopulation in target batch are some similarity between them and thats why they are identified as incorrect MNN pairs in my approach leading to inccorrect correction and merging of what are distinct clusters.
I am trying this idea that cluster of population have similar batch effect than overall sample and this way it would not over-correct the entire cell population. Especially when the batch effect is both technical (platform/different experiment) and biological (sex/species).
Thanks for your response!