Interpretation of cngeneson when doing differential expression analysis in MAST
1
0
Entering edit mode
@francescobrundugmailcom-5985
Last seen 3.2 years ago

Hi all,

I am running MAST for Single-Cell differential gene expression analysis. I followed the vignette on https://github.com/RGLab/MAST/blob/master/vignettes/MAITAnalysis.Rmd .

The code I'm using is the following:

sca <- FromMatrix(as.matrix(df), cData = cData, fData = fData)
cdr2 <-colSums(assay(sca)>0)
colData(sca)$cngeneson <- scale(cdr2) cond <- factor(colData(sca)$type)
# used type2 as reference level
cond <- relevel(cond, 'type2')
colData(sca)$type<-cond zlmCond <- zlm(~ type + cngeneson, sca, parallel = TRUE) summary <- summary(zlmCond, doLRT='type1') print(summary, n=4) The result I got is: Fitted zlm with top 4 genes per contrast: ( log fold change Z-score ) primerid type1 cngeneson Gene1 63.1* 0.3 Gene2 70.7* 5.8 Gene3 -23.9 87.8* Gene4 -30.1 87.2* Gene5 -17.1 96.8* Gene6 -20.9 93.6* Gene7 64.8* 10.5 Gene8 65.0* 9.2  If I understood correctly, type1 cells are differentially upregulated in Gene{1,2,7,8}. The * should represent significance (p < 0.01). However, how to interpret the cngeneson differentially expressed genes? If I recall correctly, cngeneson is the number of genes detected in each cell. But I am not able to understand which additional insight can provide this contrast. Thanks, Francesco mast differential gene expression • 691 views ADD COMMENT 1 Entering edit mode @andrew_mcdavid-11488 Last seen 4 months ago If I understood correctly, type1 cells are differentially upregulated in Gene{1,2,7,8}. The * should represent significance (p < 0.01). For both the type and cngeneson covaraites as the message at the top of the output states, the top 4 genes by Z score are showh. The * indicates which contrast the gene is in the top 4 list. All are extremely significant, much lower than P<.01. However, how to interpret the cngeneson differentially expressed genes? If I recall correctly, cngeneson is the number of genes detected in each cell. But I am not able to understand which additional insight can provide this contrast. This generally isn't of direct interest. In any case, the print method is mainly there to provide a way to check that you coded your covariates correctly and give you a quick look at the signal in your data. Use the summary$datatable for any downstream analysis.

0
Entering edit mode

Thanks Andrew. If I only want the first 10 genes per contrast, I can safely assume to take directly the output of print, right? Or is there any caveat?

1
Entering edit mode
Well, if all you care about is the top 10 genes, sure. But you probably will want to know p values and effect sizes, too, which are all in the datatable. You can get the top 10 genes by contrast by ordering it by contrast and then p value.
0
Entering edit mode

Thanks. I was asking because ordering by 'coef' of logFC gives me a set of DE genes with minimal overlap with the genes printed by summary (considering 10 DE genes). It is surely because one ordering is done using z score (print) and the other using effect size (logFC coef). I didn't fully understand which one I should use (I'd go for the coef but it is unclear to me why z score is displayed instead in the summary), can you explain this?