Question

Asymptotic dispersion, DESeq2

0

Entering edit mode

stevenn.volant • 0

@stevennvolant-9599

Last seen 6.3 years ago

Hi,

Does someone have any idea about the asymptotic behavior (i.e. with a large number of samples) of the dispersion estimation ?

Thank you

deseq2 • 728 views

ADD COMMENT • link updated 8.2 years ago by Michael Love 41k • written 8.2 years ago by stevenn.volant • 0

score 2 · Accepted Answer · 2016-01-27

DESeq2's estimator is the posterior mode. This converges to the unbiased 'maximum of the Cox-Reid adjusted likelihood' estimator as the sample size grows to infinity (see DESeq2 paper's Methods section, which has reference to the edgeR paper on this adjustment).

Keep in mind that, like the sample variance, the MLE for the dispersion takes longer to converge to the true value compared to estimators for the mean. Which is why sharing information across genes (using the prior distribution for genes with similar mean value) is such a good idea and improves inference.

Here's a toy example and a plot showing the posterior mode converging to the true value (orange) although it starts around the center of the prior (purple).

library(DESeq2)
samp.size <- c(3:12,
               2:10 * 10,
               5:12 * 25)
disps <- numeric(length(samp.size))
prior.mean <- .2
true.disp <- .1
for (i in seq_along(samp.size)) {
  cat(i)
  dds <- makeExampleDESeqDataSet(n=100, m=samp.size[i],
                                 dispMeanRel=function(x) prior.mean)
  cnts <- rnbinom(ncol(dds), mu=200, size=1/true.disp)
  mode(cnts) <- "integer"
  counts(dds)[1,] <- cnts
  sizeFactors(dds) <- rep(1, ncol(dds))
  dds <- estimateDispersions(dds, quiet=TRUE, fitType="mean")
  disps[i] <- dispersions(dds)[1]
}
plot(samp.size, disps, log="y")
abline(h=prior.mean, col="purple")
abline(h=true.disp, col="orange")