The reference for DESeq2's estimateDispersions() says that it uses Cox-Reid likelihood that was originally implemented in edgeR in 2010. I take it (correct me if I am wrong) that one drawback of CR vs GLM is that GLM can incorporate the effective library size as an offset, whereas CR can't. For that reason, counts have to be normalized prior to applying CR.
In DESeq2 normalized counts are obtained by dividing the raw counts by the corresponding normalization factors:
rawCounts = counts(dds, normalized = FALSE);
dds = estimateSizeFactors(dds);
normFactors = sizeFactors(dds);
normCounts = counts(dds, normalized = TRUE);
normCountsNew = t(t(rawCounts) / normFactors);
# The last two are the same normalized counts.
I would like to compare this to the old (“classic”) version of edgeR that didn't rely on GLM. They could have applied a simple division as above, but instead they generated pseudo-counts using “quantile-adjusted CML” from Robinson et al, 2008, “Small sample estimation of negative binomial dispersion”. It looks like the authors of DESeq/DESeq2 were well aware of the pseudo-counts approach, but instead they decided to simplify it. Is there any evidence that the simplified version isn't much worse?
Just to add to Gordon's answer; in general, there can be a considerable difference between simple scaling of the counts compared to the more sophisticated quantile adjustment method. Consider the simplest case, where we have a Poisson-distributed count y with mean u. Now, let's say that this count is occurring in a library that's half the size of all the other libraries. I could "normalize" for the library size difference by doubling y, which ensures that the mean of the doubled count 2y is now 2u and comparable to the counts for the other libraries.
However, by doing so, I would increase the variance of the scaled count by 4-fold. As var(y) = u in the Poisson distribution, I would end up with var(2y) = 4u. As such, my scaled count is no longer Poisson-distributed (if it were, then var(2y) should be equal to 2u instead). In contrast, quantile adjustment will preserve the mean-variance relationship, generating a Poisson-distributed pseudo-count with mean 2u and variance 2u. This might seem like a subtle point, but it is quite important; accurate modeling of the variance is critical to the DE analysis, and failure to do so will result in loss of power or loss of error control.
So, basically, that's why quantile adjustment is used rather than direct scaling in "classic"
edgeR
. Of course, if you're using GLMs, this isn't an issue; normalization is handled naturally via the offsets without distorting the mean-variance relationship.