11 weeks ago by
Cambridge, United Kingdom
The classifier is based on ranks of expression within each cell, which is robust to how much you increase or decrease the library size of that cell. The real question is whether this classifier is robust to changes in coverage between genes when you switch to different technologies. For example, gene A may have higher counts than gene B in full length protocols, but may have fewer counts in 3'-based protocols. I don't have any real idea of how badly the classifier is affected by such differences, but the original paper did see decent performance for a range of datasets generated from different scRNA-seq protocols, so it's probably okay.
If that doesn't work out for you, another cheap approach would be to get a list of marker genes for each phase and to simply perform Wilcoxon tests between each pair of phases using each individual cell's expression profile. Each cell is assigned to the phase where its genes have the highest expression. The challenge becomes where to get these markers - it's hard to find a curated reference source, and KEGG is less than helpful. GO provides some genes for G1 (GO:0000080), S (GO:0000084), G1/S (GO:0000082) and G2/M (GO:0000086), so this might be a good place to start. Of course, you could just pull some markers from some random experiment, as is done by
cyclone. However, I find this rather unappealing, and I would have thought that we would have better reference annotation for such a well-studied biological process.