Question: Bypass dispersion estimation in DESeq2
gravatar for stevenn.volant
3 months ago by
stevenn.volant0 wrote:


I use the DESeq2 package for many projects in which the number of samples is quite small. But, sometimes i have some projects with almost 700 samples and i want to do the same kind of analysis for all the projects.

When the number of samples is huge the estimation of the dispersion is very time consuming. In this specific case, i think that using the estimation providing by the parametric or local regression is not required, the gene-wise dispersion can be used (or something close to this estimation).

So my question is, is there a way to by pass the estimation and directly use the gene-wise dispersion ?


ADD COMMENTlink modified 3 months ago by Michael Love16k • written 3 months ago by stevenn.volant0
gravatar for Michael Love
3 months ago by
Michael Love16k
United States
Michael Love16k wrote:

hi Stevenn,

Yes, you are right that the MAP will be nearly the same as the MLE with hundreds of residual degrees of freedom. To use the MLE instead of calculating the MAP you can use:

dds <- estimateDispersionsGeneEst(dds)
dispersions(dds) <- mcols(dds)$dispGeneEst
dds <- nbinomWaldTest(dds) # or nbinomLRT()

But also, aside from the above, you can use some simple wrappers to parallelize many steps in DESeq2. This is what occurs within DESeq() when you use parallel=TRUE, but I'm trying to clean up the function now so it's easier for users to copy the code and do it on their own. There may be some better techniques for memory management which I'm actively exploring.

The above steps can be distributed to any number of cores, e.g.,

idx <- factor(sort(rep(seq_len(nworkers),length=nrow(dds))))
dds.list <- lapply(levels(idx), function(l) dds[idx==l,])
dds.list <- bplapply(dds.list, function(d) {
  # some code to run on 'd'
  # a function that should NOT go here is estimateDispersionsFit()
  # ...that function needs to see all of the rows of 'dds' together
dds <-, dds.list)
ADD COMMENTlink written 3 months ago by Michael Love16k

If this is becoming a frequent enough use case, could the above (first code chunk) be exposed as an option in the DESeq() wrapper?

ADD REPLYlink written 3 months ago by Wolfgang Huber13k

That's a reasonable suggestion. DESeq2 then reduces to MLE dispersion with the GLM, which you could just as well get with e.g. glm.nb() with log of size factors as offset, but on the other hand you then have an object that can be input to lfcShrink() to get moderated fold changes, which is novel.

ADD REPLYlink written 3 months ago by Michael Love16k

While we’re at it, I think we should retire the automatic “no replicates” work-around, as the warning and wording surrounding this is already so strongly discouraging. Users can always do this manually, but we would no longer be facilitating.

ADD REPLYlink written 3 months ago by Michael Love16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 163 users visited in the last hour