Question: DESeq2 why estimate dispersion on log scale
0
gravatar for xr2120
14 months ago by
xr21200
xr21200 wrote:

Hi, 

I have a question about how DESeq2 estimate dispersion parameters. In the low level function of DESeq2, when estimating dispersion alpha, I think the function is optimizing alpha on log scale (i.e. take the partial derivative with respect to log(alpha)). My questions are:

1. Why not optimizing alpha on original scale?

2. Optimize alpha on log scale vs. optimize alpha on original scale, do we get the same optimal alpha?

Thanks.

 

<embed height="0" id="xunlei_com_thunder_helper_plugin_d462f475-c18e-46be-bd10-327458d045bd" type="application/thunder_download_plugin" width="0"/>

ADD COMMENTlink modified 14 months ago by funnyjokes1.com0 • written 14 months ago by xr21200
Answer: DESeq2 why estimate dispersion on log scale
4
gravatar for Michael Love
14 months ago by
Michael Love24k
United States
Michael Love24k wrote:

You aren't limited by the way to digging through the code, we discuss how we fit the dispersion parameter in detail in the Methods section of the DESeq2 paper:

Estimation of dispersions

"We assume the dispersion parameter α i follows a log-normal prior distribution that is centered around a trend that depends on the gene’s mean normalized read count..."

We optimize the dispersion on the log scale for a number of reasons that are touched on in the Methods. It makes the line search easier, because we can go in either direction, we're not bounded on one side. Also because the distribution of estimated dispersions is approximately normal after taking the log. Here we cite Wu 2012 which was published when we had just started working on the problem, and they also noted that log of estimated dispersion is approximately normal, looking across genes. Another reason is that we came up with a technique to estimate the expected variance of the log dispersion estimate which luckily does not involve the dispersion parameter itself. After taking the log, the scaling factor drops away. We need the expected variance in order to set the prior width: this is a really important part of the model because it decides how much to move the dispersion estimates towards the middle value for genes with similar expression level. It's not a perfect solution, but it works well for residual degrees of freedom > 3.

ADD COMMENTlink written 14 months ago by Michael Love24k

Thank you so much for the detailed explanation. 

<embed height="0" id="xunlei_com_thunder_helper_plugin_d462f475-c18e-46be-bd10-327458d045bd" type="application/thunder_download_plugin" width="0"/>

ADD REPLYlink modified 14 months ago • written 14 months ago by xr21200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 296 users visited in the last hour