how to do downstram analysis to cells which are subsetted after MNN batch-correction
1
0
Entering edit mode
@shangguandong1996-21805
Last seen 6 days ago
China

Hi, I was wondering how to do downstream analysis for cells which are subsetted after MNN batch-correction. According to the OSCA, we can repeating modelling and PCA on the subset.,and then we can do UMAP, cluster.

But this document is for one single cell data, I am confused about the the fastMNN integrated cells. Accoring to some issuses from Bioconductor support or github, I can do fastMNN or not. But I am still confused about some steps.

For the first step(model var), I am sure I have to model var again. But I am not sure how to model var. After all, these subseting cells from two or more scRNA-seq set. should I use the modelGeneVar with block factor to mar batchs?

For the second steps(PCA), should I use corrected value from fastMNN or just origin logcount ? should I do multiBatchNorm again?

Best wishes

Guandong Shang

batchelor fastMNN scRNAseq • 124 views
2
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 16 hours ago
The city by the bay

In this case, you can just replace denoisePCA in the example with the various MNN steps. So:

# NOTE: not tested!
sce.sub <- sce[,some_kind_of_subset]
dec.sub <- modelGeneVar(sce.sub, block=sce.sub$batch) sce.sub <- multiBatchNorm(sce.sub, batch=sce.sub$batch)
sce.sub <- fastMNN(sce.sub, batch=sce.sub\$batch, subset.row=getTopHVGs(dec.sub))


For the first step(model var), I am sure I have to model var again. But I am not sure how to model var. After all, these subseting cells from two or more scRNA-seq set. should I use the modelGeneVar with block factor to mar batchs?

Yes, assuming that your subset of cells still contains multiple batches, then you should be specifying the blocking factor.

For the second steps(PCA), should I use corrected value from fastMNN or just origin logcount ? should I do multiBatchNorm again?

Use the log-counts, the corrected values don't have much value in a re-analysis.

It might help to pretend that your subsetted dataset is, in fact, your full matrix, and then re-apply all the steps that you would have done on the full matrix. Generally speaking, the algorithms involved don't care that you subsetted the dataset beforehand. The only exception is that we omit the QC step on the subsetted dataset because we already did the QC on the full dataset and we don't want to lose more cells in the subset analysis. I also omitted the re-running logNormCounts in the subset analysis but that's largely for convenience; I don't think it would hurt if you re-did it.

0
Entering edit mode