[DESeq2] Best way to select the optimal fitType for dispersion estimates?
2
1
Entering edit mode
Christoph ▴ 10
@christoph-10148
Last seen 3.7 years ago

Dear Michael,

I am using DESeq2 to analyze differential expression between two conditions (RNAseq, 3 replicates each). When playing around with different fit types as parameters for the DESeq() function I found that the default (parametric) fit results in 29 genes with adjusted p-values below my threshold while the local fit gives me 153.

This lead me to the question which of the two I should use and if there is an objective way to make this decision. I obviously don't want to choose one just because it gives me more/better results.

The plots I get from the plotDispEsts() function can be found here:
local fit: http://i.imgur.com/grpWtnv.png
parametric fit: http://i.imgur.com/A79woce.png

It would be great if you could give me some suggestions as to how this decision should be made? If at all possible, I'd like to find a way to assess the fit in a more quantitative way than visual inspection.

Christoph

deseq2 • 4.9k views
4
Entering edit mode
@mikelove
Last seen 1 day ago
United States

hi Christoph,

I'd prefer the local fit here. You could compare the distribution of residuals on the log scale (log gene-est - log disp-fit), and then look at a robust measure of these (e.g. median absolute residual) for a more quantitative approach. Here, it would be expected that the local regression does better, as it has more degrees of freedom compared to the parametric with only two parameters. The reason parametric is default in DESeq2 is because the local fit really deserves a visual inspection.

0
Entering edit mode

Hi Michael,

although I understand the principal of this approach I am having troubles implementing the computation into R. Could you share some code snippets how to compute the median of the absolute residuals for the following example?

require(DESeq2)
DDS <- makeExampleDESeqDataSet()
DDS <- estimateSizeFactors(DDS)
par <- estimateDispersions(DDS, fitType = "parametric")
loc <- estimateDispersions(DDS, fitType = "local")
plotDispEsts(par, main= "dispEst: parametric")
plotDispEsts(loc, main= "dispEst: local")


0
Entering edit mode

The residuals are:

residual <- mcols(dds)$dispGeneEst - mcols(dds)$dispFit

0
Entering edit mode
Christoph ▴ 10
@christoph-10148
Last seen 3.7 years ago

Hi Michael,

If I calculate the distribution of residuals as you suggested (log(dispGeneEst) - log(dispFit)) it shows that the median for the local fit is -0.06 vs -0.41 for the parametric fit. If I understood this correctly, this would then be in line with the local regression giving the better fit since the difference between fit and estimate is generally smaller?

2
Entering edit mode

Note I suggested: median absolute residual. This absolute part is important, it's like squaring in the calculation of SD.

0
Entering edit mode

Ah yes, I forgot to take the absolute value. Now the difference is still there but less obvious (0.61 vs 0.68).

In any case, I'll definitely use this in the future (together with the visual inspection) to determine which fit to choose.

Thanks again!

0
Entering edit mode

I have shifted my question as comment to Michael's post. :)