Can VST-transformed community composition data be expressed as relative abundance?
Entering edit mode
dwas • 0
Last seen 3 months ago


I aim to correlate taxa-specific relative/total abundances (derived from 16S Illumina MiSeq Sequencing) of soil microbes with metadata along an environmental soil gradient.

In order to normalize samples for variable sequencing depths (18k - 45k reads) I have so far rarefied community data obtained by iterative subsampling (ISS) to the lowest sequencing depth (i.e. 18k reads). These ISS-normalized abundances I had used as input for subsequent correlation analysis. However, searching literature I realized that this type of normalization may be unsuitable for differential abundance analysis. McMurdie and Holmes (2014) propose (alongside other approaches) to use variance stabilizing transformation (VST) from the package DESeq2 (function varianceStabilizingTransformation()) as a precursor for subsequent differential abundance analysis.

Now I was wondering if it is mathematically sound if I use my raw abundance data (+1 to get rid of zeros), do VST, backtransform the resulting abundances using 2^x and express taxa abundances as a relative abundance (i.e. VST&backtransformed abundances / sum(VST&backtransformed abundances) ) for each community (in my case, for each soil sample)? For several reasons in my downstream analysis (such as getting to estimates of total abundance by multiplication of relative abundance with 16S qPCR data), this would be relevant for my goals.

However, I do not understand VST well enough to figure out whether such workflow would be ok? ```

vst relativeabundance Normalization DESeq2 • 400 views
Entering edit mode
Last seen 1 day ago
United States

The point of the VST is to make the data systematically homoskedastic (that is, not systematic trend of variance over mean, but not e.g. forcing the data to have unit variance).

A few notes:

  • VST doesn't involve pseudocounts ahead of the transformation.
  • I wouldn't do the VST then exponentiation for downstream analysis.
  • People seem to prefer models other than NB for microbiome data, see e.g. CoDa or these packages from Amy Willis lab:
Entering edit mode

Thank you for your fast and helpful clarifications.

I must have understood the pseudo-counts wrong (the video episode CC195 of the Riffomonas project by PD Schloss on Youtube somehow left me with that idea).

Would it perhaps seem mathematically admissible to do the estimation of total abundance (by multiplication of relative abundance with qPCR data) before doing the transformation?

I will also look at the elaborate paramedic package by the Willis group, I was not aware of it, thanks for this suggestion.


Login before adding your answer.

Traffic: 268 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6