Question

Intepretation of Z scores for gene set enrichment analysis in MAST

0

Entering edit mode

siajunren • 0

@siajunren-12197

Last seen 2.7 years ago

I performed zlmfit on a matrix with MAST for 2 populations of cells, A and B, with population A as intercept. Then I run gseaAfterBoot on the zlfit object after performing bootstrapping with bootVcov1. If I get a positive value Z score for a particular gene set, does it mean that genes in this geneset are upregulated in population A compared to B? Similarly, a negative Z score mean that genes in this geneset are downregulated in A compared to B. Or is it the other way round? Not sure of the directionality here, because I am not sure what the model coefficients that were used to calculate the Z score mean.

MAST scRNA GSEA • 4.2k views

ADD COMMENT • link updated 6.8 years ago by Andrew_McDavid ▴ 270 • written 6.8 years ago by siajunren • 0

score 2 · Accepted Answer · 2017-07-28

The model coefficients that are being tested for enrichment are the coefficients from the zlmfit objects. With A as a ref, then positive coefficients in the ZlmFit object means that the mean of B > mean of A. The GSEA is calculating the average ZlmFit coefficient inside the set. So a positive Z-score in gseaAfterBoot means that B > A, on average, in the gene set.

Example

library(MAST)
package.version("MAST")
## "1.3.0"
data(vbetaFA)
vb1 = subset(vbetaFA, ncells==1)
vb1 = vb1[,freq(vb1)>.1][1:15,]
zf = zlm(~Stim.Condition, vb1)
coef(zf, 'D') #discrete components

boots = bootVcov1(zf, 5)
sets = list(A = 1:2, #positive coefficients 
B = 3:6) #negative
gsea = gseaAfterBoot(zf, boots, sets, CoefficientHypothesis('Stim.ConditionUnstim'))
calcZ(gsea)[,,'Z'] #zscore is positive for A, negative for B
summary(gsea) #pretty-printer