Hi,
I was wondering what would be the best approach to perform clustering on a subset of cells pulled out from a MNN-batch-corrected object.
I used fastMNN from SeuratWrappers to perform a MNN batch-correction and perform an integrated analysis.
Then I subsetted a cluster of cells and wanted to perform re-clustering.
Should I perform another round of MNN batch-correction on the subsetted object or proceed with the usual analytical workflow (e.g. Seurat, scran)?
Thank you.
tl;dr I don't think it's necessary but there probably isn't any harm in doing so either.
By default, I would just keep things simple and go ahead with the rest of the analysis without another round of correction. Save yourself some time in writing code and computation.
I would only do it again if there was still some batch effect in the subset. This is possible because fastMNN() uses information from all cells to help remove the batch effect, and if the "direction" of the subset's batch effect is different from the other cells, then you'll get incomplete removal. Re-correcting might get you a better-looking merge in terms of improved mixing between batches, assuming that the cells are genuinely of the same type.
If you must correct again for your subset of cells, make sure you do it from the original log-count values, not from the already-corrected values that are returned by fastMNN(). Using those would not be good.
Also consider the discussion here: https://github.com/MarioniLab/FurtherMNN2018/issues/6. This is not particularly relevant if your subset contains a group of similar cells, but if you are subsetting by other data-independent factors (e.g., experimental condition) it may be important to keep in mind.
Hi, Aaron.
I noticed that in OSCA. You repeating modelling and PCA on the subset..
But how can I do it in fastMNN data. Or what value should I use to do modelGeneVar, runPCA and cluster? Should I use origin logcount value or corrected value ?
Hi, Aaron. I noticed that in OSCA. You
repeating modelling and PCA on the subset.
.But how can I do it in fastMNN data. Or what value should I use to do
modelGeneVar
,runPCA
andcluster
? Should I use origin logcount value or corrected value ?