Search
Question: Bypass dispersion estimation in DESeq2
0
gravatar for stevenn.volant
14 days ago by
stevenn.volant0 wrote:

Hello,

I use the DESeq2 package for many projects in which the number of samples is quite small. But, sometimes i have some projects with almost 700 samples and i want to do the same kind of analysis for all the projects.

When the number of samples is huge the estimation of the dispersion is very time consuming. In this specific case, i think that using the estimation providing by the parametric or local regression is not required, the gene-wise dispersion can be used (or something close to this estimation).

So my question is, is there a way to by pass the estimation and directly use the gene-wise dispersion ?

Best

ADD COMMENTlink modified 14 days ago by Michael Love15k • written 14 days ago by stevenn.volant0
1
gravatar for Michael Love
14 days ago by
Michael Love15k
United States
Michael Love15k wrote:

hi Stevenn,

Yes, you are right that the MAP will be nearly the same as the MLE with hundreds of residual degrees of freedom. To use the MLE instead of calculating the MAP you can use:

dds <- estimateDispersionsGeneEst(dds)
dispersions(dds) <- mcols(dds)$dispGeneEst
dds <- nbinomWaldTest(dds) # or nbinomLRT()

But also, aside from the above, you can use some simple wrappers to parallelize many steps in DESeq2. This is what occurs within DESeq() when you use parallel=TRUE, but I'm trying to clean up the function now so it's easier for users to copy the code and do it on their own. There may be some better techniques for memory management which I'm actively exploring.

The above steps can be distributed to any number of cores, e.g.,

idx <- factor(sort(rep(seq_len(nworkers),length=nrow(dds))))
dds.list <- lapply(levels(idx), function(l) dds[idx==l,])
dds.list <- bplapply(dds.list, function(d) {
  # some code to run on 'd'
  # a function that should NOT go here is estimateDispersionsFit()
  # ...that function needs to see all of the rows of 'dds' together
})
dds <- do.call(rbind, dds.list)
ADD COMMENTlink written 14 days ago by Michael Love15k

If this is becoming a frequent enough use case, could the above (first code chunk) be exposed as an option in the DESeq() wrapper?

ADD REPLYlink written 14 days ago by Wolfgang Huber13k

That's a reasonable suggestion. DESeq2 then reduces to MLE dispersion with the GLM, which you could just as well get with e.g. glm.nb() with log of size factors as offset, but on the other hand you then have an object that can be input to lfcShrink() to get moderated fold changes, which is novel.

ADD REPLYlink written 14 days ago by Michael Love15k

While we’re at it, I think we should retire the automatic “no replicates” work-around, as the warning and wording surrounding this is already so strongly discouraging. Users can always do this manually, but we would no longer be facilitating.

ADD REPLYlink written 14 days ago by Michael Love15k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 105 users visited in the last hour