Normalizing vs RNA content in scRNAseq data
Entering edit mode
Last seen 4.2 years ago
United Kingdom

Hi there, 

I'm basically dealing with two scenarios here - one is using Census from Monocle2 to convert scRNAseq data with no spike-ins or spike-in normalisation using scran. My downstream workflow uses Seurat to carry out clustering, population discovery et cetera so I want to know if the best thing to use for downstream processing is the normalised values *not* corrected for total RNA content in the cell , or whether to, particularly in the case of Census estimates, divide by the total estimated content of the mRNA in question first (as opposed to dividing read counts by total number of reads or using scran estimates using deconvolution sum factors).


monocle scran • 570 views
Entering edit mode
Aaron Lun ★ 27k
Last seen 15 minutes ago
The city by the bay

Your question boils down to "should I preserve the effects of total RNA content or not". If total RNA content is of interest to you, then you should use spike-in normalization, as this will not normalize out changes in content. If not, then you should use methods based on the assumption of a non-DE majority of genes, such as the deconvolution method in scran.

The choice is ultimately dependent on your biological question and system, but as a rule of thumb; can you easily relate changes in total RNA content to biological function or causes? For example, T cells get bigger when they get activated, so there's a clear cause/effect between total RNA content and the biology. In this case, I might want to preserve the total RNA content as its biological interpretation is obvious.

In contrast, if I have two distinct cell types (with no information about their lineage), the nature of any differences in total RNA content between those two types is less clear. Of greater interest are the identity of genes that are upregulated or downregulated in each cell type, conditional on the changes in total RNA content. In other words, we want to remove changes in total RNA content here, to offset differences in the overall transcriptional activity of each cell type when determining if a gene is turned on or off. In such cases, I would be inclined to use non-DE methods.


Login before adding your answer.

Traffic: 208 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6