Question

NAs in continuous coefficient and fold change in `zlm`

2

Entering edit mode

Andrew_McDavid ▴ 280

@andrew_mcdavid-11488

Last seen 15 months ago

United States

I noticed that MAST sometimes has NA for the continuous coefficient. When this happens, it also has NA for the estimated fold change. I couldn't find much in the documentation about this, but I was wondering if someone can explain it?

This is a problem for me because I'm trying to select the most differentially expressed genes using the fold change. But when I do this, I noticed that some genes are being left out, even though they have very large discrete components. The only solution is to select genes using the discrete component, ignoring the continuous component altogether, but at this point, I should just use logistic regression.

If someone can help out at all, I would really appreciate it. Thanks!

scrnaseq MAST differential gene expression • 2.6k views

ADD COMMENT • link 8.4 years ago Andrew_McDavid ▴ 280

score 2 · Answer 1 · 2017-08-14

TL;DR

One of the classes you are comparing has no detectable expression for a gene. The log-fold change is trying to get an estimate of effect size, but that is not well-defined when we never observe expression in a class. P-values from lrTest or waldTest are still well-defined.

Fine print

From the two-part model MAST defines, we derive an estimate of the log-fold change in expression between conditions 1 and 0 as

$$ E(U_1)E(V_1) - E(U_0)E(V_0) $$

where $V_1 = 1$ when expression $Y_1$ exceeds the hurdle value and $U_1 = Y_1$ given $V_1=1$. ($V_0, U_0$ are defined analogously). See ?logFC for more details.

The estimates for these expectations come from the fitted model, and when we never exceed the hurdle in a condition 0 or 1, there is no information by which to estimate $E(U)$. You might be tempted to set it to zero, but that's not quite right since we are operating on a log-scale, so minus infinity would be what is needed for mathematical consistency. In any case, this still doesn't give us a confidence interval or a significance test.

There is experimental support to estimate $U$ in this case by applying a shrinkage prior (essentially imputing in these cases) by setting useContinuousBayes = TRUE. I can't really recommend this in the current version (MAST_1.3.1) because the fit seems to be somewhat numerically unstable and fails to converge at times. We are trying to fix this in the next version.