Help interpreting DESeq2 method for gene-wide dispersion estimates
Entering edit mode
Last seen 4.5 years ago
USA/Connecticut/UConn and JAX-GM

The DESeq2 paper, page 15 explains the first of the three steps the algorithm uses for finding the dispersion as a function of the mean as follows:

To get a gene-wise dispersion estimate for a gene i, we start by fitting a negative binomial GLM without an LFC prior for the design matrix X to the gene’s count data. This GLM uses a rough method-of-moments estimate of dispersion, based on the within-group variances and means.

I would like to understand this better and I would appreciate more details on this "rough method-of-moments estimate."  In principle I can extract this from the code, but that hasn't proved straightforward.  Even a clear pointer to exactly where this is carried out in the source code would be a big help.

deseq2 negative binomial model glm • 833 views
Entering edit mode
Last seen 9 hours ago
United States

Best to just check the code, we use the minimum of roughDispEstimate() which uses a linear model to fit estimated counts, and momentsDispEst() which just uses the simple mean and variance across all samples and the simple moments based estimate. These are unexported functions in core.R. And remember this is none of the genewise, fitted, or posterior dispersion estimate used in DESeq2, it’s just a rough, stand-in value before the genewise dispersion is estimated.

Entering edit mode

Thanks, this is helpful!  Is there a place where you provide more details on the mathematics of fitting the dispersion function to the data beyond what's in the paper?  Perhaps slides from a course on the mathematical foundations of the algorithm, or something like that?

Entering edit mode

Do you mean, how dispersionFunction is fit?

In ?estimateDispersions we have:

"• parametric - fit a dispersion-mean relation of the form:
dispersion = asymptDisp + extraPois/mean
via a robust gamma-family GLM. The coefficients asymptDisp and extraPois are given"

Then in the DESeq2 paper we say:

"For the trend function, we use the same parametrization as we used for DEXSeq, namely, ..."

Then if you follow the link to the DEXSeq paper you can read:

"To obtain the coefficients a0 and a1, we regress the dispersion estimates An external file that holds a picture, illustration, etc.
Object name is 2008inf12.jpg for all counting bins from all genes on their average normalized count values An external file that holds a picture, illustration, etc.
Object name is 2008inf13.jpg with a gamma-family GLM. To ensure robustness of the fit, we iteratively leave out bins with large residuals until convergence is achieved (Huber 1981)."


Login before adding your answer.

Traffic: 462 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6