Hi all, I have been combing the internet for hours trying to find a clear answer to this question and am a bit stumped. I apologize in advance for the basic question and the long description of my project - just trying to add enough context.
I am attemping to look at differences of abundances in shotgun metagenome samples that came from dust, specifically looking at gene abundances across contigs instead of MAGs. I co-assembled my reads (using normalized reads) into contigs, then mapped the non-normalized reads from each metagenome back to the assembled contigs to get an idea of the abundances of these genes across the metagenomes (a community-function based approach). After mapping the reads back to genes found in the assembly and counting up the reads mapped per gene using featureCounts, I then divided the total reads by gene length, and now would like to sum up these coverages by KO to get a sense of functional abundances in the metagenomes.
A colleague suggested I use the median-of-ratios transformation (MRT) by DESeq2 as a way to transform my coverages for ordinations, regressions, and PERMANOVAs to get a sense of what is influencing/driving functional differences in the sites. However, the DESeq2 vignettes suggest using the VST or rlog transformations for this purpose, rather than the median-of-ratios transformation that is used in the DESeq function for DGE.
My question is, is there a reason to not use MRT as a transformation outside of DGE? Why is VST better for ordinations and downstream analyses (outside of DGE) compared to MRT? Is I am having a bit of trouble understanding the nuance between MRT and the VST method employed by DESeq2... Is MRT less sensitive to gene dispersions compared to VST? I am also having trouble finding the exact equation used by the VST function in DESeq2...maybe I have just been looking at too many links today (see below).
Here are some resources I've used to help me answer this question - definitely open to more. I know there is the DESeq2 paper which I've read, but I think I am just missing somethign in there comparing these methods...Sorry for the long post, thanks for your help!
I appreciate the clear explanation, thank you so much! Frankly I am seeing too many people using transformation vs normalization vs scaling interchangeably and it can be hard to follow and thus decide what methods are most appropriate.