For my metagenomics dataset, I would like to retain only genes that are are >0.1% in abundance, for plotting and for subsetting differentially abundant genes
I applied VST on the raw counts. Is it OK to convert variance stabilised counts to relative abundances? or is it better to do this filtering by first transforming the raw counts matrix to relative abundances?
Edit. Since VST counts are log2 transformed, can I transform my VST counts as 2^x, to get the normalised counts and then convert these into relative abundances?
To add to the points here about bias and variance, we should say that counts scale with feature length (where here feature=gene), so the typical abundance measures divide out library size and feature length. If you really need an estimate of abundance across genes, you would want to divide out the feature length.
If I were to re-phrase things, I would ask, do you really want to filter out low abundance features, or features with low signal to noise ratio? Our transformations help with the latter.
Another note, which I'm not sure is so widely known: if you use a fast transcript quantifier like Salmon "upstream" of DESeq2, then using the transformations in DESeq2 corrects both for library size and potential changes in feature length across samples.