Search
Question: Asymptotic dispersion, DESeq2
0
2.7 years ago by
stevenn.volant0 wrote:

Hi,

Does someone have any idea about the asymptotic behavior (i.e. with a large number of samples) of the dispersion estimation ?

Thank you

modified 2.7 years ago by Michael Love19k • written 2.7 years ago by stevenn.volant0
2
2.7 years ago by
Michael Love19k
United States
Michael Love19k wrote:

DESeq2's estimator is the posterior mode. This converges to the unbiased 'maximum of the Cox-Reid adjusted likelihood' estimator as the sample size grows to infinity (see DESeq2 paper's Methods section, which has reference to the edgeR paper on this adjustment).

Keep in mind that, like the sample variance, the MLE for the dispersion takes longer to converge to the true value compared to estimators for the mean. Which is why sharing information across genes (using the prior distribution for genes with similar mean value) is such a good idea and improves inference.

Here's a toy example and a plot showing the posterior mode converging to the true value (orange) although it starts around the center of the prior (purple).

library(DESeq2)
samp.size <- c(3:12,
2:10 * 10,
5:12 * 25)
disps <- numeric(length(samp.size))
prior.mean <- .2
true.disp <- .1
for (i in seq_along(samp.size)) {
cat(i)
dds <- makeExampleDESeqDataSet(n=100, m=samp.size[i],
dispMeanRel=function(x) prior.mean)
cnts <- rnbinom(ncol(dds), mu=200, size=1/true.disp)
mode(cnts) <- "integer"
counts(dds)[1,] <- cnts
sizeFactors(dds) <- rep(1, ncol(dds))
dds <- estimateDispersions(dds, quiet=TRUE, fitType="mean")
disps[i] <- dispersions(dds)[1]
}
plot(samp.size, disps, log="y")
abline(h=prior.mean, col="purple")
abline(h=true.disp, col="orange")