Question: Mean Expression Calculation scRNA-seq (All cells or only Expressed Cells)
0
gravatar for chitsazanalex
19 months ago by
chitsazanalex10 wrote:

I'm doing a differential test for monocle and they show that differentialGeneTest() gives the features that are different between your model but doesn't tell you about which specific genes go up for particular groups. Per there documentation, they state "We could also simply compute summary statistics such as mean or median expression level on a per-CellType basis to see this, which might be handy if we are looking at more than a handful of genes." 

 

This makes sense and I have a calculated normalized expression matrix, my main question is does one normally use all single cells to calculate the mean expression, including the cells that have no detectable level or just expressed cells? So for example, a scenario were condition 1 has 400 total cells and 300 cells express geneA and Condition 2 has 200 total cells and only 50 express geneA. If I'm calculating a FC for geneA do I compare

meanexpression(400 TOTAL cells)/meanexpression(200 TOTAL cells)  or

meanexpression(300 EXPRESSING cells)/mean(50 EXPRESSING cells).

 

I can see how there would be bias in both and so I wonder which is used in the field? 

monocle scrnaseq • 332 views
ADD COMMENTlink modified 19 months ago by davide risso830 • written 19 months ago by chitsazanalex10
Answer: Mean Expression Calculation scRNA-seq (All cells or only Expressed Cells)
1
gravatar for davide risso
19 months ago by
davide risso830
University of Padova
davide risso830 wrote:

Hi,

I'm not too familiar with the monocle differential expression model, but I'll try to answer your question, which seems more general than monocle.

There is no consensus yet in the field on the best way to compare the mean expression across conditions. However, there is a very thorough review of differential expression methods that, indirectly, answer your question (e.g., a t-test would compare the mean without treating 0's in any special way and it seems to work well): https://www.nature.com/articles/nmeth.4612

In our work, we have used a zero-inflated model to "downweight" the 0's that are in excess compared to a negative binomial distribution. This seems to help boost the performance of DE methods developed for bulk RNA-seq and might be a good strategy in your case: i.e., instead of either removing or keeping the 0's in your mean computation, you can downweight them so that they do not influence the mean so much (think about it as a middle ground between your two solutions).

More details on our approach can be found in the paper: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1406-4 and the method is implemented in the zinbwave Bioconductor package.

ADD COMMENTlink written 19 months ago by davide risso830
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 216 users visited in the last hour