Question

Transforming Variance Stabilised Counts to Relative Abundances

1

Entering edit mode

adityabandla ▴ 10

@adityabandla-11584

Last seen 5.3 years ago

For my metagenomics dataset, I would like to retain only genes that are are >0.1% in abundance, for plotting and for subsetting differentially abundant genes

I applied VST on the raw counts. Is it OK to convert variance stabilised counts to relative abundances? or is it better to do this filtering by first transforming the raw counts matrix to relative abundances?

Edit. Since VST counts are log2 transformed, can I transform my VST counts as 2^x, to get the normalised counts and then convert these into relative abundances?

deseq2 • 1.2k views

ADD COMMENT • link updated 7.5 years ago by Wolfgang Huber ★ 13k • written 7.5 years ago by adityabandla ▴ 10

score 1 · Answer 1 · 2016-10-09

At the core, this is a question about variance-bias trade offs (VBTO) of two different estimators. VBTO is, of course, a huge topic that pervades much of statistics (see e.g. http://stats.stackexchange.com/questions/20295/what-problem-do-shrinkage-methods-solve etc). With your finite data sample, you can only imperfectly estimate the true, underlying, abundance. And the question is what trade-offs you accept. The two main opposing goals are precision and bias.

The "naive" counts (after suitable library size normalization) are unbiased estimators of true abundance, but for small numbers, can be highly variable. Also, ratios between them have nasty finite sample behavior. In contrast, the VST aims to trade in a more or less small amount of bias for a big reduction in variability and more normal behavior. (This is, for the small counts; for the large ones, VST and log2 are equivalent.)

So there is really no apodictic answer to your question, it depends on what you want to do. That said, I'd choose the normalized but otherwise untransformed values for a task such as you describe - just because it's simpler (Occam's razor).