The reference for DESeq2's estimateDispersions() says that it uses Cox-Reid likelihood that was originally implemented in edgeR in 2010. I take it (correct me if I am wrong) that one drawback of CR vs GLM is that GLM can incorporate the effective library size as an offset, whereas CR can't. For that reason, counts have to be normalized prior to applying CR.
In DESeq2 normalized counts are obtained by dividing the raw counts by the corresponding normalization factors:
rawCounts = counts(dds, normalized = FALSE);
dds = estimateSizeFactors(dds);
normFactors = sizeFactors(dds);
normCounts = counts(dds, normalized = TRUE);
normCountsNew = t(t(rawCounts) / normFactors);
# The last two are the same normalized counts.
I would like to compare this to the old (“classic”) version of edgeR that didn't rely on GLM. They could have applied a simple division as above, but instead they generated pseudo-counts using “quantile-adjusted CML” from Robinson et al, 2008, “Small sample estimation of negative binomial dispersion”. It looks like the authors of DESeq/DESeq2 were well aware of the pseudo-counts approach, but instead they decided to simplify it. Is there any evidence that the simplified version isn't much worse?