I'd like to second Sasha's request for clarifications on the ZIG and Feature Model methods, and the recommended use of these tests. From the metagenomeseq.pdf page 16:
" This is our latest development and we recommend fitFeatureModel over fitZig [...] By reparametrizing our zero-inflation model, we’re able to fit a zero-inflated model for each specific OTU separately. We currently recommend using the zero-inflated log-normal model as implemented in fitFeatureModel "
Account of how these functions work are found in the associated NMethods paper:
" To explicitly account for undersampling, we use a mixture model that implements a ZIG distribution of mean group abundance for each taxonomic feature [...] Using posterior probability estimates that account for community undersampling as weights to estimate count distribution parameters reduced the estimated fold change between the two groups under study. Furthermore, counts after accounting for undersampling were better fit by a log-normal distribution (Shapiro-Wilks test, P = 0.78) than were normalized counts (Shapiro-Wilks test, P = 0.08). "
from these readings:
fitZIG uses a Zero-Inflated, Gaussian (normal) distribution, mixture-model with posterior-probability weighting of OTU abundances to model the distribution of OTU counts.
fitFeatureModel instead uses a Zero-Inflated, Log-Normal distribution, mixture-model with the same P.P weighting as above, because as per the NMethods paper: "counts after accounting for undersampling were better fit by a log-normal distribution (Shapiro-Wilks test, P = 0.78) than were normalized counts (Shapiro-Wilks test, P = 0.08).". So fitFeatureModel provides better approximations to a normal distribution for our OTUs, making our Moderated T Tests more reliable (see next, please correct as required).
From what I can tell, both methods then use the moderated T test à la the limma package (see page 60) to test whether actual counts diverge from their respective models significantly (are differentially abundant) with respect to experimental factors (& an added error value ε as per mixture models) defined in model selection.
So, both methods do the 'same' thing, but are based in different distributions (gaussian / log-normal) for standardising counts between samples.
However,
- neither method's output (via MRcoefs/fulltable etc) is explained/worked through. While fitFeatureModel does report log2 fold change, fitZig is ambiguous (column title is simply the coefficient of interest).
- It is not clear whether or not the Moderated T Test is measuring analogous values for either function (e.g. are reported coefficient values log2FC in both, thus both are T-Tested and loosely comparable?)
- Nor is it stated anywhere (in the manual or NMethods) what the differential test being carried out is, although there are references to limma (which uses moderated T Test).
- fitZig is capable of [testing multiple groups (section 4.2)]*, fitFeatureModel is not. I expect many could be tempted to use fitZig for their multiple testing and disregard fitFeatureModel's improvements.
- metagenomeSeq 1.16.0: unique features give NA's for logFC / p-val?. From repeating the test with fitZig, it does not have this issue but logFC (?) values differ, so results may not be comparable. Again, a temptation to use fitZig over fitFeatureModel to overcome this.
From this: fitFeatureModel is closest to parametric methods, and is preferable to fitZig if it works for your situation. fitZig is a looser parametric method with more obtuse output but better support in this package. Any corrections are gladly accepted, and the efforts of the authors are much appreciated.
PS: * = edited from 'doing multiple testing' for clarity
The fitZig output is analogous to Limma's topTable output due to historical reasons and how closely fitZig was built on top of the lmFit function. The column title is intended to be the coefficient of interest (as logFC is really just the estimate the coefficient of interest in our case). I agree though that there should be similarity in output between fitZig and fitFeatureModel so I will work on that.
I don't understand most of your other questions/comments. For example, multiple testing correction is always possible and recommended. The user can take any set of p-values and perform their favorite correction using p.adjust or other functions.
Apologies, I'll try to correct & clarify:
[edit:] Additionally, the contribution of the log2 transformation to CSS is not conveyed in the metagenomeSeq manual. It's importance in discriminating populations is not addressed, and although it defaults=TRUE and is implemented in the workthrough provided, there is no guidance or explanation on its use. Amending this would leave users more informed on their analysis.