Question

Help interpreting DESeq2 method for gene-wide dispersion estimates

0

Entering edit mode

jeremy.teitelbaum • 0

@jeremyteitelbaum-16687

Last seen 5.7 years ago

USA/Connecticut/UConn and JAX-GM

The DESeq2 paper, page 15 explains the first of the three steps the algorithm uses for finding the dispersion as a function of the mean as follows:

To get a gene-wise dispersion estimate for a gene i, we start by fitting a negative binomial GLM without an LFC prior for the design matrix X to the gene’s count data. This GLM uses a rough method-of-moments estimate of dispersion, based on the within-group variances and means.

I would like to understand this better and I would appreciate more details on this "rough method-of-moments estimate." In principle I can extract this from the code, but that hasn't proved straightforward. Even a clear pointer to exactly where this is carried out in the source code would be a big help.

deseq2 negative binomial model glm • 1.2k views

ADD COMMENT • link updated 5.7 years ago by Michael Love 41k • written 5.7 years ago by jeremy.teitelbaum • 0

score 1 · Accepted Answer · 2018-07-28

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 16 minutes ago

United States

Best to just check the code, we use the minimum of roughDispEstimate() which uses a linear model to fit estimated counts, and momentsDispEst() which just uses the simple mean and variance across all samples and the simple moments based estimate. These are unexported functions in core.R. And remember this is none of the genewise, fitted, or posterior dispersion estimate used in DESeq2, it’s just a rough, stand-in value before the genewise dispersion is estimated.

ADD COMMENT • link 5.7 years ago Michael Love 41k

0

Entering edit mode

Thanks, this is helpful! Is there a place where you provide more details on the mathematics of fitting the dispersion function to the data beyond what's in the paper? Perhaps slides from a course on the mathematical foundations of the algorithm, or something like that?

ADD REPLY • link 5.7 years ago jeremy.teitelbaum • 0

0

Entering edit mode

Do you mean, how dispersionFunction is fit?

In ?estimateDispersions we have:

"• parametric - fit a dispersion-mean relation of the form:
dispersion = asymptDisp + extraPois/mean
via a robust gamma-family GLM. The coefficients asymptDisp and extraPois are given"

Then in the DESeq2 paper we say:

"For the trend function, we use the same parametrization as we used for DEXSeq, namely, ..."

Then if you follow the link to the DEXSeq paper you can read:

"To obtain the coefficients a0 and a1, we regress the dispersion estimates for all counting bins from all genes on their average normalized count values with a gamma-family GLM. To ensure robustness of the fit, we iteratively leave out bins with large residuals until convergence is achieved (Huber 1981)."

ADD REPLY • link 5.7 years ago Michael Love 41k