Question

VST for scRNA-seq

0

Entering edit mode

igor ▴ 50

@igor

Last seen 13 days ago

United States

Variance-stabilizing transformation has been around for years in DESeq2 (and as a more general concept before that) and works really well with bulk RNA-seq. However, I haven't seen it applied to single-cell RNA-seq. There is an exploratory blog post by Valentine Svensson from 2017, but that seems to be it. For normalization purposes, scRNA-seq should really be split into read-count and UMI-count categories. For UMI counts, there is a variant of VST implemented in SCTransform. For read counts, why not use the classic DESeq2 implementation? It may not be perfect, but neither is log(x+1) which currently seems to be the default. Is it just the speed issue (it may take a while for hundreds or thousands of samples)?

deseq2 rnaseq normalization • 3.6k views

ADD COMMENT • link updated 5.2 years ago by Michael Love 43k • written 5.2 years ago by igor ▴ 50

3

Entering edit mode

I have some thoughts here: https://ltla.github.io/SingleCellThoughts/general/transformation.html. Mostly written for UMI count data but the some principles are applicable to read count data; the major difference between the two, for the purposes of this discussion, is that the increased overall noise of read counts makes it more difficult to observe systematic technical biases.

ADD REPLY • link 5.2 years ago Aaron Lun ★ 28k

0

Entering edit mode

That is a good resource. I don't know if it offers more questions than answers, though.

ADD REPLY • link 5.2 years ago igor ▴ 50

2

Entering edit mode

My (personal) conclusion was that, for most applications, it doesn't really matter and the log-transformation is fine. In fact, it's better than fine, because its simplicity means that it is fast and reliable and its limitations are well understood. For everything else, it's much harder to be sure that it will run without error (even DESeq2::vst, as mature as it is, threw an error in my tests above), and that the artifacts will be predictable (who knows what the distances will be after running sctransform::vst?). Of course, there are problems with the log-transformation as well (see comments here) but it has served well as a default for all of my real analyses.

ADD REPLY • link 5.2 years ago Aaron Lun ★ 28k

score 1 · Answer 1 · 2020-01-29

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 7 days ago

United States

I don't have any specific information as I haven't done my own evaluations on scRNA-seq data.

However, I'd point you to this recent paper that examines the effect of transformations for dimension reduction on 8 datasets:

https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1861-6

ADD COMMENT • link 5.2 years ago Michael Love 43k

0

Entering edit mode

Thank you for the suggestion. As everyone else, they also focus on UMI counts, which is unfortunate.

ADD REPLY • link 5.2 years ago igor ▴ 50