DESeq2 why estimate dispersion on log scale
2
0
Entering edit mode
xr2120 • 0
@xr2120-13720
Last seen 5.9 years ago

Hi, 

I have a question about how DESeq2 estimate dispersion parameters. In the low level function of DESeq2, when estimating dispersion alpha, I think the function is optimizing alpha on log scale (i.e. take the partial derivative with respect to log(alpha)). My questions are:

1. Why not optimizing alpha on original scale?

2. Optimize alpha on log scale vs. optimize alpha on original scale, do we get the same optimal alpha?

Thanks.

 

<embed height="0" id="xunlei_com_thunder_helper_plugin_d462f475-c18e-46be-bd10-327458d045bd" type="application/thunder_download_plugin" width="0"/>

deseq2 dispersoin estimatedispersions • 1.1k views
ADD COMMENT
4
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

You aren't limited by the way to digging through the code, we discuss how we fit the dispersion parameter in detail in the Methods section of the DESeq2 paper:

Estimation of dispersions

"We assume the dispersion parameter α i follows a log-normal prior distribution that is centered around a trend that depends on the gene’s mean normalized read count..."

We optimize the dispersion on the log scale for a number of reasons that are touched on in the Methods. It makes the line search easier, because we can go in either direction, we're not bounded on one side. Also because the distribution of estimated dispersions is approximately normal after taking the log. Here we cite Wu 2012 which was published when we had just started working on the problem, and they also noted that log of estimated dispersion is approximately normal, looking across genes. Another reason is that we came up with a technique to estimate the expected variance of the log dispersion estimate which luckily does not involve the dispersion parameter itself. After taking the log, the scaling factor drops away. We need the expected variance in order to set the prior width: this is a really important part of the model because it decides how much to move the dispersion estimates towards the middle value for genes with similar expression level. It's not a perfect solution, but it works well for residual degrees of freedom > 3.

ADD COMMENT
0
Entering edit mode

Thank you so much for the detailed explanation. 

<embed height="0" id="xunlei_com_thunder_helper_plugin_d462f475-c18e-46be-bd10-327458d045bd" type="application/thunder_download_plugin" width="0"/>

ADD REPLY

Login before adding your answer.

Traffic: 748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6