I aim to correlate taxa-specific relative/total abundances (derived from 16S Illumina MiSeq Sequencing) of soil microbes with metadata along an environmental soil gradient.
In order to normalize samples for variable sequencing depths (18k - 45k reads) I have so far rarefied community data obtained by iterative subsampling (ISS) to the lowest sequencing depth (i.e. 18k reads). These ISS-normalized abundances I had used as input for subsequent correlation analysis. However, searching literature I realized that this type of normalization may be unsuitable for differential abundance analysis. McMurdie and Holmes (2014) propose (alongside other approaches) to use variance stabilizing transformation (VST) from the package DESeq2 (function varianceStabilizingTransformation()) as a precursor for subsequent differential abundance analysis.
Now I was wondering if it is mathematically sound if I use my raw abundance data (+1 to get rid of zeros), do VST, backtransform the resulting abundances using 2^x and express taxa abundances as a relative abundance (i.e. VST&backtransformed abundances / sum(VST&backtransformed abundances) ) for each community (in my case, for each soil sample)? For several reasons in my downstream analysis (such as getting to estimates of total abundance by multiplication of relative abundance with 16S qPCR data), this would be relevant for my goals.
However, I do not understand VST well enough to figure out whether such workflow would be ok? ```