DESeq2, normalization for within-sample comparison, and 3'-end RNAseq data
theophile
I have a question about DESeq2 count normalization in the context of 3'-end RNA sequencing. I understand that the median of rations normalization used by DESeq2 is not suitable for within-sample comparison of gene expression, notably because it does not account for differences in gene length.

However, when using 3'-end RNAseq, we end up with counts that reflect the molecular abundance of transcripts (1 molecule = 1 read), which is a big difference with traditional RNAseq in which (1 molecule = n reads).

In this context, can we do within sample comparison of gene expression of DESeq2-normalized counts, or are there factors other than gene length that are still unaccounted for and that would prevent such comparisons?

I'd bet there are still technical factors beside abundance that affects 3' counts, e.g. GC content. You could use cqn to model this though, to get closer to true abundance. (Also you can incorporate cqn normalization factors into DESeq2, which will be used in counts and vst etc.)


