MAST: What does the function gseaAfterBoot do exactly?
2
0
Entering edit mode
dfm2129 • 0
@dfm2129-15329
Last seen 3.3 years ago

Hi,
Can you provide details of the calculations gseaAfterBoot is doing with zlmCond (as it is named in the MAIT vignette) to generate the file "gsea"? It is not clear what the values in the output tables represent, unless I am missing some documentation.
Thanks so much,
Dan Moakley

MAST • 521 views
0
Entering edit mode
@andrew_mcdavid-11488
Last seen 4 months ago

In the MAITAnalysis vignette, section 5, a competitive gene set enrichment is performed

zlmCond <- zlm(~condition + cngeneson, sca)
# bootstrap, resampling cells
# R should be set to >50 if you were doing this for real.
boots <- bootVcov1(zlmCond, R=4)
gsea <- gseaAfterBoot(zlmCond, boots, sets_indices, CoefficientHypothesis("conditionStim"))


Where sets_indices is a list of indices of genes in sca that provides the different gene sets. This tests whether in cells with the same value of cngeneson (the cellular detection rate), the difference between condition=='Stim' and condition=='Unstim' is different between the geneset and the background. zlmCond is passed to gseaAfterBoot because it contains "true" values of the conditionStim coefficients.  According to the documentation ?gseaAfterBoot:

A 4D array is returned, with dimensions "set" (each module), "comp" ('disc'rete or 'cont'inuous), "metric" ('stat' gives the average of the coefficient, 'var' gives the variance of that average, 'dof' gives the number of genes that were actually tested in the set), "group" ('test' for the genes in test-set, "null" for all genes outside the test-set).

The @tests slot is not intended to be user-facing and doesn't have an accessor method. Instead, it's recommended to call calcZ or summary. But we can verify the documentation is accurate using the example provided in the ?gseaAfterBoot help.

data(vbetaFA)
vb1 = subset(vbetaFA, ncells==1)
vb1 = vb1[,freq(vb1)>.1][1:15,]
zf = zlm(~Stim.Condition, vb1)
boots = bootVcov1(zf, 5)
sets=list(A=1:5, B=3:10, C=15, D=1:5)
gsea=gseaAfterBoot(zf, boots, sets, CoefficientHypothesis('Stim.ConditionUnstim'))
dimnames(gsea@tests)


$set [1] "A" "B" "C" "D"$comp [1] "cont" "disc"

$metric [1] "stat" "var" "dof" "avgCor"$group [1] "test" "null"

So this is a 4 x 2 x 4 x 2 array with leading dimension giving the gene set, second dimension giving the model component, 3rd dimension collecting statistics from the bootstraps, and 4th dimension holding the values from within the gene set vs the background. The components "stat", "var", "dof" and "avgCor" represent the mean, variance, degrees of freedom and average correlation for a gene set, model component and gene-group combination.

For example

gsea@tests['A','cont','stat','test']


is the set A, continuous, mean of the test group.

0
Entering edit mode
dfm2129 • 0
@dfm2129-15329
Last seen 3.3 years ago

Thanks so much Andrew for the quick response.

I am working on getting it from the gseaAfterBoot code, but I was looking for the exact calculation done during this step: "This tests whether in cells with the same value of cngeneson (the cellular detection rate), the difference between condition=='Stim' and condition=='Unstim' is different between the geneset and the background."

If it is handy, great, but otherwise I can figure it out from the code!

0
Entering edit mode
The "test" happens in calcZ, which is using the means and variances collected in gsea@tests. The test is nothing fancy, just a normal theory Z-test. The fancy thing is that the variance collected in calcZ consists of a quadratic form on the gene-gene covariance matrix, and that the covariance matrix is never explicitly expanded. The math is recounted in section 4 of the supplement in https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0844-5. The code in gseaAfterBoot is probably entirely too clever for its own good, but the salient parts are lines 136-144 which use some functions defined at lines 43-91.
0
Entering edit mode

Thanks Andrew! I think I get the idea.