Question

Normalizing vs RNA content in scRNAseq data

0

Entering edit mode

ankur.chakravarthy.10 ▴ 40

@ankurchakravarthy10-8368

Last seen 6.8 years ago

United Kingdom

Hi there,

I'm basically dealing with two scenarios here - one is using Census from Monocle2 to convert scRNAseq data with no spike-ins or spike-in normalisation using scran. My downstream workflow uses Seurat to carry out clustering, population discovery et cetera so I want to know if the best thing to use for downstream processing is the normalised values *not* corrected for total RNA content in the cell , or whether to, particularly in the case of Census estimates, divide by the total estimated content of the mRNA in question first (as opposed to dividing read counts by total number of reads or using scran estimates using deconvolution sum factors).

Ta!

monocle scran • 1.1k views

ADD COMMENT • link updated 6.8 years ago by Aaron Lun ★ 28k • written 6.8 years ago by ankur.chakravarthy.10 ▴ 40

score 3 · Accepted Answer · 2017-07-14

Your question boils down to "should I preserve the effects of total RNA content or not". If total RNA content is of interest to you, then you should use spike-in normalization, as this will not normalize out changes in content. If not, then you should use methods based on the assumption of a non-DE majority of genes, such as the deconvolution method in scran.

The choice is ultimately dependent on your biological question and system, but as a rule of thumb; can you easily relate changes in total RNA content to biological function or causes? For example, T cells get bigger when they get activated, so there's a clear cause/effect between total RNA content and the biology. In this case, I might want to preserve the total RNA content as its biological interpretation is obvious.

In contrast, if I have two distinct cell types (with no information about their lineage), the nature of any differences in total RNA content between those two types is less clear. Of greater interest are the identity of genes that are upregulated or downregulated in each cell type, conditional on the changes in total RNA content. In other words, we want to remove changes in total RNA content here, to offset differences in the overall transcriptional activity of each cell type when determining if a gene is turned on or off. In such cases, I would be inclined to use non-DE methods.