Search
Question: [DESeq2] Best way to select the optimal fitType for dispersion estimates?
0
2.6 years ago by
Christoph0 wrote:

Dear Michael,

I am using DESeq2 to analyze differential expression between two conditions (RNAseq, 3 replicates each). When playing around with different fit types as parameters for the DESeq() function I found that the default (parametric) fit results in 29 genes with adjusted p-values below my threshold while the local fit gives me 153.

This lead me to the question which of the two I should use and if there is an objective way to make this decision. I obviously don't want to choose one just because it gives me more/better results.

The plots I get from the plotDispEsts() function can be found here:
local fit: http://i.imgur.com/grpWtnv.png
parametric fit: http://i.imgur.com/A79woce.png

It would be great if you could give me some suggestions as to how this decision should be made? If at all possible, I'd like to find a way to assess the fit in a more quantitative way than visual inspection.

Christoph

modified 2.6 years ago • written 2.6 years ago by Christoph0
2
2.6 years ago by
Michael Love20k
United States
Michael Love20k wrote:

hi Christoph,

I'd prefer the local fit here. You could compare the distribution of residuals on the log scale (log gene-est - log disp-fit), and then look at a robust measure of these (e.g. median absolute residual) for a more quantitative approach. Here, it would be expected that the local regression does better, as it has more degrees of freedom compared to the parametric with only two parameters. The reason parametric is default in DESeq2 is because the local fit really deserves a visual inspection.

0
2.6 years ago by
Christoph0 wrote:

Hi Michael,

If I calculate the distribution of residuals as you suggested (log(dispGeneEst) - log(dispFit)) it shows that the median for the local fit is -0.06 vs -0.41 for the parametric fit. If I understood this correctly, this would then be in line with the local regression giving the better fit since the difference between fit and estimate is generally smaller?

1

Note I suggested: median absolute residual. This absolute part is important, it's like squaring in the calculation of SD.

Ah yes, I forgot to take the absolute value. Now the difference is still there but less obvious (0.61 vs 0.68).

In any case, I'll definitely use this in the future (together with the visual inspection) to determine which fit to choose.

Thanks again!