#### The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: Normalized counts from DESeq2 results in similar but not equal total read count?
0
9 days ago by
University of Pennsylvania
ricardo38890 wrote:

Hi, I think this questioned has been asked before in different ways, but maybe someone can help mw understand this a bit better. My question stems from the need to visually represent the expression levels of a gene between my two groups. I know plotCounts() uses the normalized counts from counts(dds, normalized=T), but when I look at the total read counts per library, I realized that the DESeq2 normalization didn't quite resulted in equal count sizes between my libraries(samples). Is this expected, or is there a parameter I am mising to have equal reads across libraries? Along this line, if I want to visually represent the expression changes of a gene, should I be using the normalized counts from counts(x, normalized=T) as the plotCounts() does, or should I be using the counts from rlog() or vst()? Thank you for your help. My counts are below.

colSums(counts(dds))
WT_rep3    WT_rep4   WT_rep13   WT_rep14  Null_rep1  Null_rep2  Null_rep3  Null_rep4
25372528   25524255   35306510   34688537   28857148   29386607   28380245   24795934
Null_rep11 Null_rep12
66139067   34391514
colSums(counts(dds, normalized=T))
WT_rep3    WT_rep4   WT_rep13   WT_rep14  Null_rep1  Null_rep2  Null_rep3  Null_rep4
32400476   31980209   30906366   31123613   32129757   32307761   32902001   31931771
Null_rep11 Null_rep12
31123321   3126603

normalization deseq2 • 49 views
modified 9 days ago by James W. MacDonald49k • written 9 days ago by ricardo38890
Answer: Normalized counts from DESeq2 results in similar but not equal total read count?
0
9 days ago by
United States
James W. MacDonald49k wrote:

The only time you would expect the normalized counts to sum to the same exact value across libraries would be if you expect that there are no differentially expressed genes, in which case any differences in library size are due only to technical differences (starting amount of mRNA, variability in library prep, etc).

But if there are some genes that are differentially expressed (and particularly if some of those genes are highly differentially expressed), then you would probably want to exclude them when computing the size factors that you will use to normalize, because the point of the normalization is to account for technical differences while still retaining biological differences. If you included the genes that are likely to be changing expression, then you run the risk of erasing some of the biological signal you want.

There are lots of different ways to choose genes in order to (hopefully) choose just those genes that are different due to technical rather than biological differences, and if you care to know more, there are papers you can read (see for example the citation in ?estimateSizeFactors).