Question: Estimation of non-trivial contrasts in MAST
gravatar for Nik Tuzov
6 weeks ago by
Nik Tuzov70
United States
Nik Tuzov70 wrote:


Could you tell me if there is an easy way to estimate non-trivial contrasts in MAST? E.g. there are two factors, A and B, with 3 and 5 levels, the model is Y = A + B + A*B, and I want to test (A1, B2) vs (A2, B5). I assume the answer is no because a similar question was asked of DESeq2 a while ago and MAST doesn't look different in that sense.

mast • 116 views
ADD COMMENTlink modified 6 weeks ago by Andrew_McDavid190 • written 6 weeks ago by Nik Tuzov70
Answer: Estimation of non-trivial contrasts in MAST
gravatar for Andrew_McDavid
6 weeks ago by
Andrew_McDavid190 wrote:

Yes, this is possible, and AFAIK is possible in DESeq2, too. It's easy to get tripped up by R builds design matrices from the symbolic formula that contain interactions. When in doubt, expand the interactions explicitly with e.g. interaction.

ng = 10
nc = 20
cd = expand.grid(A = factor(1:3), B = factor(1:5), rep = 1:nc)
cd$AB = with(cd, interaction(A, B))
mm = model.matrix(~AB, data = cd)
eta = mm %*% c(3, -2, 0, # (A=2, B=1) - (A=1, B=1), not tested
               2, #(A=1, B=2) - (A=1, B=1), tested
               rep(0, 11)) # rest null 
U = matrix(eta, nrow = length(eta), ncol = ng)
U = U + rnorm(length(U))
PV = as.vector( exp(eta) / (1 + exp(eta)))
V =  1*(runif(length(U)) < PV)
dim(V) = dim(U)
Y = U*V
sca = FromMatrix(t(Y),cData = cd)
zz = zlm(~AB, sca = sca)

waldTest(zz, Hypothesis("AB2.5 - AB1.2"))
# Similar to (up to sampling error)
waldTest(zz, Hypothesis("-AB1.2"))
# The AB2.5 coefficient should be approx. zero
waldTest(zz, Hypothesis("AB2.5"))

Really, this is more of a base-R question. Forsake not your prophets Venables and Ripley. Chapter 6 shall provide the Light. the Truth, and the Way.

ADD COMMENTlink written 6 weeks ago by Andrew_McDavid190

Hello Andrew:

Thanks for replying. I also have a MAST specific question. The paper describes how shrinkage is done but it says little about why it was necessary in the first place. What kind of evidence did you get to justify the application of shrinkage?

Regards, Nik

ADD REPLYlink written 6 weeks ago by Nik Tuzov70

There are two forms of shrinkage happening, and they were developed after a fair amount of iterative exploration of data at the time, though little of this made it into the paper. One for the logistic regression coefficients, and one for the linear regression $\sigma^2$. In practice, the first one is probably more important, because it results in reasonable point estimates (and Wald confidence intervals) when there is linear separation, which is the rule rather than the exception.

The shrinkage on $\sigma^2$ is often less important given the larger number of replicates in scRNAseq, but can still have an impact since the number of cells with expression $>0$ varies from gene to gene. If filtering for the amount of expression hasn't occurred, then sometimes you are trying to make inference with only a handful of cells.

If you are curious what the $\sigma^2$ shrinkage is doing to your data you can get the "effective number of cells" and "prior precision" from the model object,


and turn off shrinkage with zlm(..., method = 'glm', ebayes = FALSE). In the simulation I ran you can see that the effective degrees of freedom is extremely high, reflecting the small number of genes and exact homogeneity in the variance. The variance is overestimated because by default, it is fit just using the intercept (the H0 model). See ?ebayes if you are curious for more details.

ADD REPLYlink written 6 weeks ago by Andrew_McDavid190

Hello Andrew:

You are probably familiar with this paper of Soneson and Robinson. Their overall conclusion is that the best single-cell-specific method was MAST, but it did as well as the best of bulk RNA methods. Do you think they got something wrong?

ADD REPLYlink written 5 weeks ago by Nik Tuzov70

Hello Andrew:

What happens in contrast estimation if there is a quantitative covariate? E.g. I have the following toy data set:

y <- c(0.0, 0.0, 0.01981825833, 0.40109357786, 0.34727231426, 0.03889038132, 0.0, 0.0,
      0.11225699507, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.90902480961)
A <- factor(c(rep("A1", 8), rep("A2", 8)))
Cov <- c(1.1, 5.5, -3.2, 4, 7.7, 2.2, 0.5, -4.4, 6.6, 3.3, 8.8, 77.1, 55, 44, 33, 22)
data <- data.frame(Cov, A)

After I run zlm() w/o shrinkage, the hurdle logFC is:

logFC      AA2 -0.1119203593

Given the formula from the reference:

u(contrast1)v(contrast1) − u(contrast0)v(contrast0)

the question is how to set the value of Cov. In general, it can/should be set at its mean, but the means in the continuous and discrete parts are different because the continuous part uses only 6 observations out of 16. I'd say one should use two distinct means here, but I didn't manage to reproduce logFC. I also tried using just one mean obtained from all the observations and it didn't work either. When I use just A in the model, I reproduce logFC, so apparently the discrepancy is due to Cov. Please help me clarify this.

ADD REPLYlink written 4 weeks ago by Nik Tuzov70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 269 users visited in the last hour