I have a RNASeq data set with ~650 samples. When I run DESeq2, I get a very poor fit of the mean-dispersion trend:
Here is also a plot of the original trend (without final dispersion estimates), and color coded the number of dispersion iterations:
This is the output of DESeq:
In DESeqDataSet(se, design = ~condition + run) : some variables in design formula are characters, converting to factors estimating size factors estimating dispersions gene-wise dispersion estimates: 64 workers mean-dispersion relationship -- note: fitType='parametric', but the dispersion trend was not well captured by the function: y = a/x + b, and a local regression fit was automatically substituted. specify fitType='local' or 'mean' to avoid this message next time. final dispersion estimates, fitting model and testing: 64 workers -- replacing outliers and refitting for 8367 genes -- DESeq argument 'minReplicatesForReplace' = 7 -- original counts are preserved in counts(dds) estimating dispersions fitting model and testing
I also tried to supply a custom fit as described in the vignette:
dds <- estimateSizeFactors(dds) dds <- estimateDispersionsGeneEst(dds) useForMedian <- 1e-7 < mcols(dds)$dispGeneEst & mcols(dds)$dispGeneEst < 1e1 medianDisp <- median(mcols(dds)$dispGeneEst[useForMedian], na.rm=TRUE) dispersionFunction(dds) <- function(mu) medianDisp dds <- estimateDispersionsMAP(dds) dds <- nbinomWaldTest(dds)
But this resulted in an even worse fit:
Do you have any suggestions how to improve the fit?