Question: Interpretation of cngeneson when doing differential expression analysis in MAST
0
8 months ago by

Hi all,

I am running MAST for Single-Cell differential gene expression analysis. I followed the vignette on https://github.com/RGLab/MAST/blob/master/vignettes/MAITAnalysis.Rmd .

The code I'm using is the following:

sca <- FromMatrix(as.matrix(df), cData = cData, fData = fData)
cdr2 <-colSums(assay(sca)>0)
colData(sca)$cngeneson <- scale(cdr2) cond <- factor(colData(sca)$type)
# used type2 as reference level
cond <- relevel(cond, 'type2')
colData(sca)$type<-cond zlmCond <- zlm(~ type + cngeneson, sca, parallel = TRUE) summary <- summary(zlmCond, doLRT='type1') print(summary, n=4) The result I got is: Fitted zlm with top 4 genes per contrast: ( log fold change Z-score ) primerid type1 cngeneson Gene1 63.1* 0.3 Gene2 70.7* 5.8 Gene3 -23.9 87.8* Gene4 -30.1 87.2* Gene5 -17.1 96.8* Gene6 -20.9 93.6* Gene7 64.8* 10.5 Gene8 65.0* 9.2  If I understood correctly, type1 cells are differentially upregulated in Gene{1,2,7,8}. The * should represent significance (p < 0.01). However, how to interpret the cngeneson differentially expressed genes? If I recall correctly, cngeneson is the number of genes detected in each cell. But I am not able to understand which additional insight can provide this contrast. Thanks, Francesco ADD COMMENTlink modified 8 months ago by Andrew_McDavid150 • written 8 months ago by francesco.brundu@gmail.com40 Answer: A: Interpretation of cngeneson when doing differential expression analysis in MAST 1 8 months ago by Andrew_McDavid150 wrote: If I understood correctly, type1 cells are differentially upregulated in Gene{1,2,7,8}. The * should represent significance (p < 0.01). For both the type and cngeneson covaraites as the message at the top of the output states, the top 4 genes by Z score are showh. The * indicates which contrast the gene is in the top 4 list. All are extremely significant, much lower than P<.01. However, how to interpret the cngeneson differentially expressed genes? If I recall correctly, cngeneson is the number of genes detected in each cell. But I am not able to understand which additional insight can provide this contrast. This generally isn't of direct interest. In any case, the print method is mainly there to provide a way to check that you coded your covariates correctly and give you a quick look at the signal in your data. Use the summary$datatable for any downstream analysis.

Thanks Andrew. If I only want the first 10 genes per contrast, I can safely assume to take directly the output of print, right? Or is there any caveat?

1
Well, if all you care about is the top 10 genes, sure. But you probably will want to know p values and effect sizes, too, which are all in the datatable. You can get the top 10 genes by contrast by ordering it by contrast and then p value.