Interpretation of cngeneson when doing differential expression analysis in MAST
1
0
Entering edit mode
@francescobrundugmailcom-5985
Last seen 6.7 years ago

Hi all,

I am running MAST for Single-Cell differential gene expression analysis. I followed the vignette on https://github.com/RGLab/MAST/blob/master/vignettes/MAITAnalysis.Rmd .

The code I'm using is the following:

sca <- FromMatrix(as.matrix(df), cData = cData, fData = fData)
cdr2 <-colSums(assay(sca)>0)
colData(sca)$cngeneson <- scale(cdr2)
cond <- factor(colData(sca)$type)
# used type2 as reference level
cond <- relevel(cond, 'type2')
colData(sca)$type<-cond
zlmCond <- zlm(~ type + cngeneson, sca, parallel = TRUE)
summary <- summary(zlmCond, doLRT='type1')
print(summary, n=4)

The result I got is:

Fitted zlm with top 4 genes per contrast:
( log fold change Z-score )
 primerid type1    cngeneson
 Gene1      63.1*    0.3   
 Gene2      70.7*    5.8   
 Gene3     -23.9    87.8*  
 Gene4     -30.1    87.2*  
 Gene5     -17.1    96.8*  
 Gene6     -20.9    93.6*  
 Gene7      64.8*   10.5   
 Gene8      65.0*    9.2   

If I understood correctly, type1 cells are differentially upregulated in Gene{1,2,7,8}. The * should represent significance (p < 0.01). However, how to interpret the cngeneson differentially expressed genes? If I recall correctly, cngeneson is the number of genes detected in each cell. But I am not able to understand which additional insight can provide this contrast.

Thanks,

Francesco

mast differential gene expression • 2.2k views
ADD COMMENT
1
Entering edit mode
@andrew_mcdavid-11488
Last seen 11 weeks ago
United States

If I understood correctly, type1 cells are differentially upregulated in Gene{1,2,7,8}. The * should represent significance (p < 0.01).

For both the type and cngeneson covaraites as the message at the top of the output states, the top 4 genes by Z score are showh. The * indicates which contrast the gene is in the top 4 list. All are extremely significant, much lower than P<.01.

However, how to interpret the cngeneson differentially expressed genes? If I recall correctly, cngeneson is the number of genes detected in each cell. But I am not able to understand which additional insight can provide this contrast.

This generally isn't of direct interest.
In any case, the print method is mainly there to provide a way to check that you coded your covariates correctly and give you a quick look at the signal in your data.
Use the summary$datatable for any downstream analysis.

ADD COMMENT
0
Entering edit mode

Thanks Andrew. If I only want the first 10 genes per contrast, I can safely assume to take directly the output of print, right? Or is there any caveat?

1
Entering edit mode
Well, if all you care about is the top 10 genes, sure. But you probably will want to know p values and effect sizes, too, which are all in the `datatable.` You can get the top 10 genes by contrast by `order`ing it by contrast and then p value.
ADD REPLY
0
Entering edit mode

Thanks. I was asking because ordering by 'coef' of logFC gives me a set of DE genes with minimal overlap with the genes printed by summary (considering 10 DE genes). It is surely because one ordering is done using z score (print) and the other using effect size (logFC coef). I didn't fully understand which one I should use (I'd go for the coef but it is unclear to me why z score is displayed instead in the summary), can you explain this?

Login before adding your answer.

Traffic: 500 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6