Question

[DESeq2] Best way to select the optimal fitType for dispersion estimates?

1

Entering edit mode

Christoph ▴ 10

@christoph-10148

Last seen 6.5 years ago

Dear Michael,

I am using DESeq2 to analyze differential expression between two conditions (RNAseq, 3 replicates each). When playing around with different fit types as parameters for the DESeq() function I found that the default (parametric) fit results in 29 genes with adjusted p-values below my threshold while the local fit gives me 153.

This lead me to the question which of the two I should use and if there is an objective way to make this decision. I obviously don't want to choose one just because it gives me more/better results.

The plots I get from the plotDispEsts() function can be found here:
local fit: http://i.imgur.com/grpWtnv.png
parametric fit: http://i.imgur.com/A79woce.png

It would be great if you could give me some suggestions as to how this decision should be made? If at all possible, I'd like to find a way to assess the fit in a more quantitative way than visual inspection.

Thanks in advance,
Christoph

deseq2 • 10k views

ADD COMMENT • link updated 3.4 years ago by Michael Love 41k • written 8.0 years ago by Christoph ▴ 10

score 5 · Answer 1 · 2016-04-21

5

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

hi Christoph,

I'd prefer the local fit here. You could compare the distribution of residuals on the log scale (log gene-est - log disp-fit), and then look at a robust measure of these (e.g. median absolute residual) for a more quantitative approach. Here, it would be expected that the local regression does better, as it has more degrees of freedom compared to the parametric with only two parameters. The reason parametric is default in DESeq2 is because the local fit really deserves a visual inspection.

ADD COMMENT • link 8.0 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

although I understand the principal of this approach I am having troubles implementing the computation into R. Could you share some code snippets how to compute the median of the absolute residuals for the following example?

require(DESeq2)
DDS <- makeExampleDESeqDataSet()
DDS <- estimateSizeFactors(DDS)
par <- estimateDispersions(DDS, fitType = "parametric")
loc <- estimateDispersions(DDS, fitType = "local")
plotDispEsts(par, main= "dispEst: parametric")
plotDispEsts(loc, main= "dispEst: local")

Thanks already in advance :)

ADD REPLY • link 3.4 years ago Hannes • 0

0

Entering edit mode

The residuals are:

residual <- mcols(dds)$dispGeneEst - mcols(dds)$dispFit

ADD REPLY • link 3.4 years ago Michael Love 41k

score 0 · Answer 2 · 2016-04-22

0

Entering edit mode

Christoph ▴ 10

@christoph-10148

Last seen 6.5 years ago

Hi Michael,

thank you very much for your quick and helpful reply!

If I calculate the distribution of residuals as you suggested (log(dispGeneEst) - log(dispFit)) it shows that the median for the local fit is -0.06 vs -0.41 for the parametric fit. If I understood this correctly, this would then be in line with the local regression giving the better fit since the difference between fit and estimate is generally smaller?

ADD COMMENT • link 8.0 years ago Christoph ▴ 10

2

Entering edit mode

Note I suggested: median absolute residual. This absolute part is important, it's like squaring in the calculation of SD.

ADD REPLY • link 8.0 years ago Michael Love 41k

0

Entering edit mode

Ah yes, I forgot to take the absolute value. Now the difference is still there but less obvious (0.61 vs 0.68).

In any case, I'll definitely use this in the future (together with the visual inspection) to determine which fit to choose.