Cell cycle regression for scRNA-seq data
1
0
Entering edit mode
ATpoint ★ 1.6k
@atpoint-13662
Last seen 2 hours ago
Germany

My scRNA-seq data (10X, murine, hematopoietic cells) have the problem that some clusters are separated almost exclusively by cell cycle which is not interesting for the scenario we are woring with and only inflates the number of clusters. This can be shown both with PCA run on cell cycle genes (separation there is obvious in PC1 vs PC2 for some clusters) plus with cyclone cell cycle assignment as in the book chapter 16.4. Therefore I would like to remove the effect e.g. as in the book chapter 16.5. Removal of the cell cycle genes from the selected features is not sufficient and does not really make any difference, therefore looking for a more aggressive strategy. Following chapter 16.4 I am not clear on the exact workflow from there on. Do we run regressBatches on the original logcounts and then repeat the feature selection, integration and clustering procedure? Also, is there something similar in the Bioconductor world as in the last chapter of the Seurat vignette where not the cell cycle effect itself but the difference between the G2M and S phase scores is regressed?

OSCA batchelor cell cycle regression • 1.2k views
2
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 20 hours ago
The city by the bay

Do we run regressBatches on the original logcounts and then repeat the feature selection, integration and clustering procedure?

The latest version of the chapter has a bit more information available. Briefly, the regression just applies to the log-values you feed into the PCA. Clustering picks up from the PCs, so it doesn't need extra regression. And feature selection can use block= to ensure that cell cycle differences do not drive the detection of HVGs.

I take it you've read and understood my comments on the potential problems from using regression, so I won't repeat them here. I will just say that I would still prefer gene removal as this is more predictable and less liable to introduce artifacts - see the new version of the chapter for a more aggressive empirical version of this approach.

Also, is there something similar in the Bioconductor world as in the last chapter of the Seurat vignette where not the cell cycle effect itself but the difference between the G2M and S phase scores is regressed?

Sure, if you're got a covariate, just make a design matrix and give it to design= in regressBatches(). (Similarly, you can give it to design= in functions like findMarkers().) You can put anything in there, e.g., the cyclone() phase scores or the SingleR() correlations. However, I have come to wonder whether this hurts more than it helps; the magnitude of the scores is probably even more sensitive to confounding differences in the biological state.

0
Entering edit mode

Thanks Aaron for the extensive comment, very helpful as usual!

1
Entering edit mode

I just noticed that setting design= in regressBatches() actually also requires you to give it something like batch=integer(ncol(sce)) to keep the function happy. (It doesn't matter what the exact value is, you just had to give it something to let it move on to the next step.) I've updated the function in BioC-devel so that it no longer needs batch= if you give it design=.