DESeq2 median of ratios normalization for signature score calculation
1
0
Entering edit mode
Serge • 0
@ddeeaf3d
Last seen 5 days ago
Netherlands

I am using DESeq2 in my RNA-Seq analysis, and I am wondering whether it is possible to use the median of ratio normalization for calculation of a signature score (an average across the genes in a signature)?

This post (https://hbctraining.github.io/Training-modules/planning_successful_rnaseq/lessons/sample_level_QC.html) claims that the median of ratios normalization is not suitable for within sample comparisons. I struggle to understand why this is the case, since a sum of negative binomial distributions is a negative binomial distribution. Or is MRN not suitable for within sample comparisons but suitable for a signature score calculation?

Thank you so much!

MRN RNASeq Normalization DESeq2 • 152 views
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

DESeq2 normalization (here we talk about the output of counts(dds, normalized=TRUE), not the statistical test from DESeq()) applies to all the counts in a sample equally.

What someone may mean about within sample comparison is that you should be using a measure like TPM that accounts for gene length? You could use a transcript quantifier like Salmon and import into R with tximport or tximeta?

1
Entering edit mode

Adding on that, I think there are some things not 100% correct at this website. TMM normalization for example does not care about gene length, nor does it correct for it. As DESeq2's method it does try to find a single per-sample scaling factor to adjust for depth/composition between samples and since this is per-sample it makes no sense in an intra-sample comparison. I mean, imagine you would divide each count of a sample by the same factor, so it would not change anything other that the magnitude of counts changes. Intra-sample comparisons though would need to correct for the fact that longer genes have more counts at equal expression level than shorter genes and (thinking aloud) that some genes have different GC content, mappability etc which needs to be considered.

0
Entering edit mode

The gene length + other biases considerations is exactly why we build on top of Salmon, because that all gets rolled into the TPM, as well as the offset / normalization factor when using tximport.

0
Entering edit mode

Thank you so much for the answer! So it's valid to calculate a signature score based on the DESeq2-normalized data though it doesn't correct for the gene length (assuming no differential isoform usage)?

0
Entering edit mode

The short read RNA-seq counts are proportional to feature length, so you could compare counts with other datasets that also have length-proportional counts.

However, you wouldn't want to compare to counts that aren't proportional to length, e.g. a lot of single cell counts are 3' tag-based. And some long read datasets have counts not proportional to feature length as well, because they avoid fragmentation.