Hi,
Does someone have any idea about the asymptotic behavior (i.e. with a large number of samples) of the dispersion estimation ?
Thank you
Hi,
Does someone have any idea about the asymptotic behavior (i.e. with a large number of samples) of the dispersion estimation ?
Thank you
DESeq2's estimator is the posterior mode. This converges to the unbiased 'maximum of the Cox-Reid adjusted likelihood' estimator as the sample size grows to infinity (see DESeq2 paper's Methods section, which has reference to the edgeR paper on this adjustment).
Keep in mind that, like the sample variance, the MLE for the dispersion takes longer to converge to the true value compared to estimators for the mean. Which is why sharing information across genes (using the prior distribution for genes with similar mean value) is such a good idea and improves inference.
Here's a toy example and a plot showing the posterior mode converging to the true value (orange) although it starts around the center of the prior (purple).
library(DESeq2) samp.size <- c(3:12, 2:10 * 10, 5:12 * 25) disps <- numeric(length(samp.size)) prior.mean <- .2 true.disp <- .1 for (i in seq_along(samp.size)) { cat(i) dds <- makeExampleDESeqDataSet(n=100, m=samp.size[i], dispMeanRel=function(x) prior.mean) cnts <- rnbinom(ncol(dds), mu=200, size=1/true.disp) mode(cnts) <- "integer" counts(dds)[1,] <- cnts sizeFactors(dds) <- rep(1, ncol(dds)) dds <- estimateDispersions(dds, quiet=TRUE, fitType="mean") disps[i] <- dispersions(dds)[1] } plot(samp.size, disps, log="y") abline(h=prior.mean, col="purple") abline(h=true.disp, col="orange")
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.