Question

Interpretation of cngeneson when doing differential expression analysis in MAST

0

Entering edit mode

francesco.brundu@gmail.com ▴ 40

@francescobrundugmailcom-5985

Last seen 6.0 years ago

Hi all,

I am running MAST for Single-Cell differential gene expression analysis. I followed the vignette on https://github.com/RGLab/MAST/blob/master/vignettes/MAITAnalysis.Rmd .

The code I'm using is the following:

sca <- FromMatrix(as.matrix(df), cData = cData, fData = fData)
cdr2 <-colSums(assay(sca)>0)
colData(sca)$cngeneson <- scale(cdr2)
cond <- factor(colData(sca)$type)
# used type2 as reference level
cond <- relevel(cond, 'type2')
colData(sca)$type<-cond
zlmCond <- zlm(~ type + cngeneson, sca, parallel = TRUE)
summary <- summary(zlmCond, doLRT='type1')
print(summary, n=4)

The result I got is:

Fitted zlm with top 4 genes per contrast:
( log fold change Z-score )
 primerid type1    cngeneson
 Gene1      63.1*    0.3   
 Gene2      70.7*    5.8   
 Gene3     -23.9    87.8*  
 Gene4     -30.1    87.2*  
 Gene5     -17.1    96.8*  
 Gene6     -20.9    93.6*  
 Gene7      64.8*   10.5   
 Gene8      65.0*    9.2

If I understood correctly, type1 cells are differentially upregulated in Gene{1,2,7,8}. The * should represent significance (p < 0.01). However, how to interpret the cngeneson differentially expressed genes? If I recall correctly, cngeneson is the number of genes detected in each cell. But I am not able to understand which additional insight can provide this contrast.

Thanks,

Francesco

mast differential gene expression • 1.9k views

ADD COMMENT • link updated 6.0 years ago by Andrew_McDavid ▴ 270 • written 6.0 years ago by francesco.brundu@gmail.com ▴ 40

score 1 · Accepted Answer · 2018-04-23

1

Entering edit mode

Andrew_McDavid ▴ 270

@andrew_mcdavid-11488

Last seen 13 months ago

United States

If I understood correctly, type1 cells are differentially upregulated in Gene{1,2,7,8}. The * should represent significance (p < 0.01).

For both the type and cngeneson covaraites as the message at the top of the output states, the top 4 genes by Z score are showh. The * indicates which contrast the gene is in the top 4 list. All are extremely significant, much lower than P<.01.

However, how to interpret the cngeneson differentially expressed genes? If I recall correctly, cngeneson is the number of genes detected in each cell. But I am not able to understand which additional insight can provide this contrast.

This generally isn't of direct interest.
In any case, the print method is mainly there to provide a way to check that you coded your covariates correctly and give you a quick look at the signal in your data.
Use the summary$datatable for any downstream analysis.

ADD COMMENT • link 6.0 years ago Andrew_McDavid ▴ 270

0

Entering edit mode

Thanks Andrew. If I only want the first 10 genes per contrast, I can safely assume to take directly the output of print, right? Or is there any caveat?

ADD REPLY • link 6.0 years ago francesco.brundu@gmail.com ▴ 40

1

Entering edit mode

Well, if all you care about is the top 10 genes, sure. But you probably will want to know p values and effect sizes, too, which are all in the `datatable.` You can get the top 10 genes by contrast by `order`ing it by contrast and then p value.

ADD REPLY • link 6.0 years ago Andrew_McDavid ▴ 270

0

Entering edit mode

Thanks. I was asking because ordering by 'coef' of logFC gives me a set of DE genes with minimal overlap with the genes printed by summary (considering 10 DE genes). It is surely because one ordering is done using z score (print) and the other using effect size (logFC coef). I didn't fully understand which one I should use (I'd go for the coef but it is unclear to me why z score is displayed instead in the summary), can you explain this?

ADD REPLY • link 6.0 years ago francesco.brundu@gmail.com ▴ 40