Question

DESeq2 median of ratios normalization for signature score calculation

1

Entering edit mode

Serge ▴ 10

@ddeeaf3d

Last seen 2.3 years ago

Netherlands

I am using DESeq2 in my RNA-Seq analysis, and I am wondering whether it is possible to use the median of ratio normalization for calculation of a signature score (an average across the genes in a signature)?

This post (https://hbctraining.github.io/Training-modules/planning_successful_rnaseq/lessons/sample_level_QC.html) claims that the median of ratios normalization is not suitable for within sample comparisons. I struggle to understand why this is the case, since a sum of negative binomial distributions is a negative binomial distribution. Or is MRN not suitable for within sample comparisons but suitable for a signature score calculation?

Thank you so much!

MRN RNASeq Normalization DESeq2 • 3.0k views

ADD COMMENT • link updated 2.3 years ago by Michael Love 41k • written 2.3 years ago by Serge ▴ 10

score 1 · Answer 1 · 2022-01-13

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 7 hours ago

United States

DESeq2 normalization (here we talk about the output of counts(dds, normalized=TRUE), not the statistical test from DESeq()) applies to all the counts in a sample equally.

What someone may mean about within sample comparison is that you should be using a measure like TPM that accounts for gene length? You could use a transcript quantifier like Salmon and import into R with tximport or tximeta?

ADD COMMENT • link 2.3 years ago Michael Love 41k

1

Entering edit mode

Adding on that, I think there are some things not 100% correct at this website. TMM normalization for example does not care about gene length, nor does it correct for it. As DESeq2's method it does try to find a single per-sample scaling factor to adjust for depth/composition between samples and since this is per-sample it makes no sense in an intra-sample comparison. I mean, imagine you would divide each count of a sample by the same factor, so it would not change anything other that the magnitude of counts changes. Intra-sample comparisons though would need to correct for the fact that longer genes have more counts at equal expression level than shorter genes and (thinking aloud) that some genes have different GC content, mappability etc which needs to be considered.

ADD REPLY • link 2.3 years ago ATpoint ★ 4.0k

0

Entering edit mode

The gene length + other biases considerations is exactly why we build on top of Salmon, because that all gets rolled into the TPM, as well as the offset / normalization factor when using tximport.

ADD REPLY • link 2.3 years ago Michael Love 41k

0

Entering edit mode

Thank you so much for the answer! So it's valid to calculate a signature score based on the DESeq2-normalized data though it doesn't correct for the gene length (assuming no differential isoform usage)?

ADD REPLY • link 2.3 years ago Serge ▴ 10

0

Entering edit mode

The short read RNA-seq counts are proportional to feature length, so you could compare counts with other datasets that also have length-proportional counts.

However, you wouldn't want to compare to counts that aren't proportional to length, e.g. a lot of single cell counts are 3' tag-based. And some long read datasets have counts not proportional to feature length as well, because they avoid fragmentation.

ADD REPLY • link 2.3 years ago Michael Love 41k