Question: Interpretation of cngeneson when doing differential expression analysis in MAST
8 months ago

Hi all,

I am running MAST for Single-Cell differential gene expression analysis. I followed the vignette on https://github.com/RGLab/MAST/blob/master/vignettes/MAITAnalysis.Rmd .

The code I'm using is the following:

sca <- FromMatrix(as.matrix(df), cData = cData, fData = fData)
cdr2 <-colSums(assay(sca)>0)
colData(sca)$cngeneson <- scale(cdr2) cond <- factor(colData(sca)$type)
# used type2 as reference level
cond <- relevel(cond, 'type2')
colData(sca)$type<-cond zlmCond <- zlm(~ type + cngeneson, sca, parallel = TRUE) summary <- summary(zlmCond, doLRT='type1') print(summary, n=4) The result I got is: Fitted zlm with top 4 genes per contrast: ( log fold change Z-score ) primerid type1 cngeneson Gene1 63.1* 0.3 Gene2 70.7* 5.8 Gene3 -23.9 87.8* Gene4 -30.1 87.2* Gene5 -17.1 96.8* Gene6 -20.9 93.6* Gene7 64.8* 10.5 Gene8 65.0* 9.2  If I understood correctly, type1 cells are differentially upregulated in Gene{1,2,7,8}. The * should represent significance (p < 0.01). However, how to interpret the cngeneson differentially expressed genes? If I recall correctly, cngeneson is the number of genes detected in each cell. But I am not able to understand which additional insight can provide this contrast. Thanks, Francesco ADD COMMENTlink modified 8 months ago by Andrew_McDavid150 • written 8 months ago by francesco.brundu@gmail.com40 Answer: A: Interpretation of cngeneson when doing differential expression analysis in MAST 1 8 months ago by Andrew_McDavid150 wrote: If I understood correctly, type1 cells are differentially upregulated in Gene{1,2,7,8}. The * should represent significance (p < 0.01). For both the type and cngeneson covaraites as the message at the top of the output states, the top 4 genes by Z score are showh. The * indicates which contrast the gene is in the top 4 list. All are extremely significant, much lower than P<.01. However, how to interpret the cngeneson differentially expressed genes? If I recall correctly, cngeneson is the number of genes detected in each cell. But I am not able to understand which additional insight can provide this contrast. This generally isn't of direct interest. In any case, the print method is mainly there to provide a way to check that you coded your covariates correctly and give you a quick look at the signal in your data. Use the summary$datatable for any downstream analysis.

Thanks Andrew. If I only want the first 10 genes per contrast, I can safely assume to take directly the output of print, right? Or is there any caveat?

Well, if all you care about is the top 10 genes, sure. But you probably will want to know p values and effect sizes, too, which are all in the datatable. You can get the top 10 genes by contrast by ordering it by contrast and then p value.