My scRNA-seq data (10X, murine, hematopoietic cells) have the problem that some clusters are separated almost exclusively by cell cycle which is not interesting for the scenario we are woring with and only inflates the number of clusters.
This can be shown both with PCA run on cell cycle genes (separation there is obvious in PC1 vs PC2 for some clusters) plus with cyclone
cell cycle assignment as in the book chapter 16.4. Therefore I would like to remove the effect e.g. as in the book chapter 16.5. Removal of the cell cycle genes from the selected features is not sufficient and does not really make any difference, therefore looking for a more aggressive strategy. Following chapter 16.4 I am not clear on the exact workflow from there on. Do we run regressBatches
on the original logcounts and then repeat the feature selection, integration and clustering procedure?
Also, is there something similar in the Bioconductor world as in the last chapter of the Seurat vignette where not the cell cycle effect itself but the difference between the G2M and S phase scores is regressed?
Thanks for your suggestions!
Thanks Aaron for the extensive comment, very helpful as usual!
I just noticed that setting
design=
inregressBatches()
actually also requires you to give it something likebatch=integer(ncol(sce))
to keep the function happy. (It doesn't matter what the exact value is, you just had to give it something to let it move on to the next step.) I've updated the function in BioC-devel so that it no longer needsbatch=
if you give itdesign=
.