Search
Question: [DESeq2] Best way to select the optimal fitType for dispersion estimates?
0
gravatar for Christoph
2.6 years ago by
Christoph0
Christoph0 wrote:

Dear Michael,

I am using DESeq2 to analyze differential expression between two conditions (RNAseq, 3 replicates each). When playing around with different fit types as parameters for the DESeq() function I found that the default (parametric) fit results in 29 genes with adjusted p-values below my threshold while the local fit gives me 153.

This lead me to the question which of the two I should use and if there is an objective way to make this decision. I obviously don't want to choose one just because it gives me more/better results.

The plots I get from the plotDispEsts() function can be found here:
local fit: http://i.imgur.com/grpWtnv.png
parametric fit: http://i.imgur.com/A79woce.png

It would be great if you could give me some suggestions as to how this decision should be made? If at all possible, I'd like to find a way to assess the fit in a more quantitative way than visual inspection.

Thanks in advance,
Christoph

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Christoph0
2
gravatar for Michael Love
2.6 years ago by
Michael Love20k
United States
Michael Love20k wrote:

hi Christoph,

I'd prefer the local fit here. You could compare the distribution of residuals on the log scale (log gene-est - log disp-fit), and then look at a robust measure of these (e.g. median absolute residual) for a more quantitative approach. Here, it would be expected that the local regression does better, as it has more degrees of freedom compared to the parametric with only two parameters. The reason parametric is default in DESeq2 is because the local fit really deserves a visual inspection.

 

ADD COMMENTlink written 2.6 years ago by Michael Love20k
0
gravatar for Christoph
2.6 years ago by
Christoph0
Christoph0 wrote:

Hi Michael,

thank you very much for your quick and helpful reply!

If I calculate the distribution of residuals as you suggested (log(dispGeneEst) - log(dispFit)) it shows that the median for the local fit is -0.06 vs -0.41 for the parametric fit. If I understood this correctly, this would then be in line with the local regression giving the better fit since the difference between fit and estimate is generally smaller?
 

ADD COMMENTlink written 2.6 years ago by Christoph0
1

Note I suggested: median absolute residual. This absolute part is important, it's like squaring in the calculation of SD.

ADD REPLYlink written 2.6 years ago by Michael Love20k

Ah yes, I forgot to take the absolute value. Now the difference is still there but less obvious (0.61 vs 0.68).

In any case, I'll definitely use this in the future (together with the visual inspection) to determine which fit to choose.

Thanks again!

ADD REPLYlink written 2.6 years ago by Christoph0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 405 users visited in the last hour