Question

Difference between fitFeatureModel and fitZIG in metagenomeSeq

0

Entering edit mode

sasha • 0

@sasha-11847

Last seen 8.6 years ago

Hi,

I would like to know the differences between using fitFeatureModel and fitZIG when testing for DA using metagenomeSeq. I don't seem to find a good explanation of what fitFeatureModel does in detail and what are the differences with fitZIG, however, given that the authors (and others in the literature) recommend to use fitFeatureModel over fitZIG for microbiome analysis (due to high sensitivity and low FDR), I would like to try/test it with my data. Could PLEASE someone give me a link to find this or explain it to me?

MANY thanks in advance and cheers,

Sasha

metagenomeseq • 5.4k views

ADD COMMENT • link 8.9 years ago • updated 8.8 years ago sasha • 0

score 1 · Answer 1 · 2017-03-29

I'd like to second Sasha's request for clarifications on the ZIG and Feature Model methods, and the recommended use of these tests. From the metagenomeseq.pdf page 16:

" This is our latest development and we recommend fitFeatureModel over fitZig [...] By reparametrizing our zero-inflation model, we’re able to fit a zero-inflated model for each specific OTU separately. We currently recommend using the zero-inflated log-normal model as implemented in fitFeatureModel "

Account of how these functions work are found in the associated NMethods paper:

" To explicitly account for undersampling, we use a mixture model that implements a ZIG distribution of mean group abundance for each taxonomic feature [...] Using posterior probability estimates that account for community undersampling as weights to estimate count distribution parameters reduced the estimated fold change between the two groups under study. Furthermore, counts after accounting for undersampling were better fit by a log-normal distribution (Shapiro-Wilks test, P = 0.78) than were normalized counts (Shapiro-Wilks test, P = 0.08). "

from these readings:

fitZIG uses a Zero-Inflated, Gaussian (normal) distribution, mixture-model with posterior-probability weighting of OTU abundances to model the distribution of OTU counts.

fitFeatureModel instead uses a Zero-Inflated, Log-Normal distribution, mixture-model with the same P.P weighting as above, because as per the NMethods paper: "counts after accounting for undersampling were better fit by a log-normal distribution (Shapiro-Wilks test, P = 0.78) than were normalized counts (Shapiro-Wilks test, P = 0.08).". So fitFeatureModel provides better approximations to a normal distribution for our OTUs, making our Moderated T Tests more reliable (see next, please correct as required).

From what I can tell, both methods then use the moderated T test à la the limma package (see page 60) to test whether actual counts diverge from their respective models significantly (are differentially abundant) with respect to experimental factors (& an added error value ε as per mixture models) defined in model selection.

So, both methods do the 'same' thing, but are based in different distributions (gaussian / log-normal) for standardising counts between samples.

However,

neither method's output (via MRcoefs/fulltable etc) is explained/worked through. While fitFeatureModel does report log2 fold change, fitZig is ambiguous (column title is simply the coefficient of interest).
It is not clear whether or not the Moderated T Test is measuring analogous values for either function (e.g. are reported coefficient values log2FC in both, thus both are T-Tested and loosely comparable?)
Nor is it stated anywhere (in the manual or NMethods) what the differential test being carried out is, although there are references to limma (which uses moderated T Test).
fitZig is capable of [testing multiple groups (section 4.2)]*, fitFeatureModel is not. I expect many could be tempted to use fitZig for their multiple testing and disregard fitFeatureModel's improvements.
metagenomeSeq 1.16.0: unique features give NA's for logFC / p-val?. From repeating the test with fitZig, it does not have this issue but logFC (?) values differ, so results may not be comparable. Again, a temptation to use fitZig over fitFeatureModel to overcome this.

From this: fitFeatureModel is closest to parametric methods, and is preferable to fitZig if it works for your situation. fitZig is a looser parametric method with more obtuse output but better support in this package. Any corrections are gladly accepted, and the efforts of the authors are much appreciated.

PS: * = edited from 'doing multiple testing' for clarity

score 1 · Answer 2 · 2017-03-29

There is a bit of a confusion that I will try to address. fitZig and fitFeatureModel provide two different methods.

fitZig is the approach described in http://www.nature.com/nmeth/journal/v10/n12/full/nmeth.2658.html.
fitFeatureModel is described in Chapter 4 of Normalization and differential abundance analysis of metagenomic biomarker-gene surveys (available here: http://drum.lib.umd.edu/handle/1903/16996 ). A paper summarizing the results of Chapter 4 is in the works and potentially another paper summarizing all of the various new features to metagenomeSeq (including fitFeatureModel, etc.).

I will place in the devel vignette specific references to the description of the models for clarity. Thanks for bringing this up.

score 0 · Answer 3 · 2017-04-03

0

Entering edit mode

sasha • 0

@sasha-11847

Last seen 8.6 years ago

Thanks to both handibles and Joseph

ADD COMMENT • link 8.8 years ago sasha • 0