I have a question about how DESeq2 estimate dispersion parameters. In the low level function of DESeq2, when estimating dispersion alpha, I think the function is optimizing alpha on log scale (i.e. take the partial derivative with respect to log(alpha)). My questions are:
1. Why not optimizing alpha on original scale?
2. Optimize alpha on log scale vs. optimize alpha on original scale, do we get the same optimal alpha?
You aren't limited by the way to digging through the code, we discuss how we fit the dispersion parameter in detail in the Methods section of the DESeq2 paper:
Estimation of dispersions
"We assume the dispersion parameter α i follows a log-normal prior distribution that is centered around a trend that depends on the gene’s mean normalized read count..."
We optimize the dispersion on the log scale for a number of reasons that are touched on in the Methods. It makes the line search easier, because we can go in either direction, we're not bounded on one side. Also because the distribution of estimated dispersions is approximately normal after taking the log. Here we cite Wu 2012 which was published when we had just started working on the problem, and they also noted that log of estimated dispersion is approximately normal, looking across genes. Another reason is that we came up with a technique to estimate the expected variance of the log dispersion estimate which luckily does not involve the dispersion parameter itself. After taking the log, the scaling factor drops away. We need the expected variance in order to set the prior width: this is a really important part of the model because it decides how much to move the dispersion estimates towards the middle value for genes with similar expression level. It's not a perfect solution, but it works well for residual degrees of freedom > 3.
Thank you so much for the detailed explanation.
<embed height="0" id="xunlei_com_thunder_helper_plugin_d462f475-c18e-46be-bd10-327458d045bd" type="application/thunder_download_plugin" width="0"/>