As a general rule,
block= is always safer than
design=. The former literally processes each block separately and combines the results, which allows us to handle differences in the mean-variance trend (in
modelGeneVar()) or differences in variance between groups (in
findMarkers()). The use of a design matrix causes these methods to switch to linear models, which makes more assumptions about how similar the different blocking levels are. Nonetheless,
design= may be necessary in some cases, e.g., if you have a set-up where all cells in one cluster are in one blocking level and all cells in another cluster are in another level, it's not possible to compare them by using
block=. These points are discussed briefly in the documentation.
Honestly speaking, I have mixed feelings about regressing the cell cycle effect. It seemed like a good idea at the time, and everyone was doing it, so that's why I talked about it in the workflow. But I've become increasingly concerned that the cell cycle is not entirely orthogonal to biological processes of interest, and attempting to regress it out could cause more trouble than it's worth. For example, if one cell type cycles more actively than another cell type, or when we're talking about T cell activation, trying to regress out cell cycle effect could cripple your signal (or even worse, introduce spurious signal). I've also had some nagging doubts about the accuracy of cell type calls from
cyclone() or related methods that are based on classifiers learnt from a single reference dataset - it's not hard to find situations where the test dataset involves different cell types that don't behave much like the reference w.r.t. cell cycle-associated genes.
If I had to do it, I would block on the phase assignments, but I'm starting to consider whether just tossing out all genes with annotated associations to the cell cycle would be a safer approach (e.g., based on all terms in
GO:0007049, which pretty much covers anything that might be associated with the cell cycle). This won't get rid of unannotated genes that have expression correlated to the cell cycle, but any such effect is indistinguishable from the activity of a separate pathway that happens to be correlated to the cell cycle.