Does MNN removes same average batch vector from all cells or each cell has it's own correction vector?
1
0
Entering edit mode
p.joshi ▴ 20
@pjoshi-22718
Last seen 7 weeks ago
Germany

Hi Aaron

I am using reducedMNN with NMF as input to perform batch correction. To better understand what is happening, I have tried going through the fastMNN theory that you have posted on github. However, I am getting confused with regard to a particular thing. At one place, it suggests that a correction vector is identified for a cell in the target batch which is an averaged correction vector from all MNN pairs of that cell with cells in the reference batch. The MNN pairs help identify local variation in subpopulation of the target batch. But then the batch vector, the component of correction vector that is actually removed, is same for all cells. Doesn't that mean that the locality of correction is disregarded and the batch effect is assumed to be same across all cells?

If that is correct that indeed the effect corrected is same across all sub-populations is same, is there a way to modify the approach to correct sub-populations individually? I tried breaking the target batch into new batches based on clusters obtained in the target batch and then running reducedMNN using them as independent batches of sample, but I am getting an overlap of the target batch (clusters), rather than different batches (clusters) integrating with separate populations in the reference. I am assuming this is happening because the subpopulation in target batch are some similarity between them and thats why they are identified as incorrect MNN pairs in my approach leading to inccorrect correction and merging of what are distinct clusters.

I am trying this idea that cluster of population have similar batch effect than overall sample and this way it would not over-correct the entire cell population. Especially when the batch effect is both technical (platform/different experiment) and biological (sex/species).

Thanks for your response!

fastmnn batchelor BatchEffect • 190 views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 7 hours ago
The city by the bay

But then the batch vector, the component of correction vector that is actually removed, is same for all cells. Doesn't that mean that the locality of correction is disregarded and the batch effect is assumed to be same across all cells?

The average batch effect vector is only used for the orthogonalization, a.k.a., removing all variance along the batch vector. After that's done, the actual correction itself is done using cell-specific vectors. (Well, averaged over neighboring cells, but it should be more or less local to a subpopulation.)

In theory, it should be possible to perform orthogonalization on a per-population basis, which would give even better corrections. But I haven't had time to test it out.

I tried breaking the target batch into new batches based on clusters obtained in the target batch and then running reducedMNN using them as independent batches of sample, but I am getting an overlap of the target batch (clusters), rather than different batches (clusters) integrating with separate populations in the reference.

Don't really understand what you mean here, but the MNN approach assumes that - at the very least - there is at least one common subpopulation across batches. So if you subset your various batches and pair the wrong subsets together, the algorithm will attempt to merge them (as will any correction algorithm, really) and that won't make a lot of sense.

I am trying this idea that cluster of population have similar batch effect than overall sample and this way it would not over-correct the entire cell population.

Again, I don't really understand what you mean.

ADD COMMENT
0
Entering edit mode

Thanks Aaron for your reply.

So by individual correction you mean you obtain a vector orthogonal to the average batch vector and that is the correction for each individual cell?

By the two things which weren't clear to you, I would try to paraphrase.

Imagine there are 3 distinct clusters oligodendrocytes (human sample) in a sample and I am trying to merge this to a larger mouse oligodendrocyte atlas (for annotation or trajectory). The cluster could be different due to different marker genes or technically due to some read depth issue (just for example cycling cells seems to have lower number of read, in my experience). Now I am hypothesizing that these distinct cluster also have a distinct batch correction vector with respect to the reference and a proper integration should take that into account. So instead of averaging batch correction across entire population, it should be averaged across these individual clusters.

But if I divide those three clusters into three batches and integrate each batch serially with the reference, those 3 clusters gets integrated in a mix where the distinction that existed between them originally goes away; as in a serial batch correction, the last two batch (clusters) find the cells of the first batch (cluster) (already integrated with the reference) much closer than the rest of the reference.

Did it make sense now? If not its alright. My main aim was to ask if the correction vector for all cells is same or cell specific and you suggest it is cell specific.

ADD REPLY

Login before adding your answer.

Traffic: 211 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6