Variance-stabilizing transformation has been around for years in DESeq2 (and as a more general concept before that) and works really well with bulk RNA-seq. However, I haven't seen it applied to single-cell RNA-seq. There is an exploratory blog post by Valentine Svensson from 2017, but that seems to be it. For normalization purposes, scRNA-seq should really be split into read-count and UMI-count categories. For UMI counts, there is a variant of VST implemented in SCTransform. For read counts, why not use the classic DESeq2 implementation? It may not be perfect, but neither is log(x+1)
which currently seems to be the default. Is it just the speed issue (it may take a while for hundreds or thousands of samples)?
I have some thoughts here: https://ltla.github.io/SingleCellThoughts/general/transformation.html. Mostly written for UMI count data but the some principles are applicable to read count data; the major difference between the two, for the purposes of this discussion, is that the increased overall noise of read counts makes it more difficult to observe systematic technical biases.
That is a good resource. I don't know if it offers more questions than answers, though.
My (personal) conclusion was that, for most applications, it doesn't really matter and the log-transformation is fine. In fact, it's better than fine, because its simplicity means that it is fast and reliable and its limitations are well understood. For everything else, it's much harder to be sure that it will run without error (even
DESeq2::vst
, as mature as it is, threw an error in my tests above), and that the artifacts will be predictable (who knows what the distances will be after runningsctransform::vst
?). Of course, there are problems with the log-transformation as well (see comments here) but it has served well as a default for all of my real analyses.