Bypass dispersion estimation in DESeq2
1
0
Entering edit mode
@stevennvolant-9599
Last seen 6.2 years ago

Hello,

I use the DESeq2 package for many projects in which the number of samples is quite small. But, sometimes i have some projects with almost 700 samples and i want to do the same kind of analysis for all the projects.

When the number of samples is huge the estimation of the dispersion is very time consuming. In this specific case, i think that using the estimation providing by the parametric or local regression is not required, the gene-wise dispersion can be used (or something close to this estimation).

So my question is, is there a way to by pass the estimation and directly use the gene-wise dispersion ?

Best

deseq2 • 762 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 6 hours ago
United States

hi Stevenn,

Yes, you are right that the MAP will be nearly the same as the MLE with hundreds of residual degrees of freedom. To use the MLE instead of calculating the MAP you can use:

dds <- estimateDispersionsGeneEst(dds)
dispersions(dds) <- mcols(dds)$dispGeneEst
dds <- nbinomWaldTest(dds) # or nbinomLRT()

But also, aside from the above, you can use some simple wrappers to parallelize many steps in DESeq2. This is what occurs within DESeq() when you use parallel=TRUE, but I'm trying to clean up the function now so it's easier for users to copy the code and do it on their own. There may be some better techniques for memory management which I'm actively exploring.

The above steps can be distributed to any number of cores, e.g.,

idx <- factor(sort(rep(seq_len(nworkers),length=nrow(dds))))
dds.list <- lapply(levels(idx), function(l) dds[idx==l,])
dds.list <- bplapply(dds.list, function(d) {
  # some code to run on 'd'
  # a function that should NOT go here is estimateDispersionsFit()
  # ...that function needs to see all of the rows of 'dds' together
})
dds <- do.call(rbind, dds.list)
ADD COMMENT
0
Entering edit mode

If this is becoming a frequent enough use case, could the above (first code chunk) be exposed as an option in the DESeq() wrapper?

ADD REPLY
0
Entering edit mode

That's a reasonable suggestion. DESeq2 then reduces to MLE dispersion with the GLM, which you could just as well get with e.g. glm.nb() with log of size factors as offset, but on the other hand you then have an object that can be input to lfcShrink() to get moderated fold changes, which is novel.

ADD REPLY
0
Entering edit mode

While we’re at it, I think we should retire the automatic “no replicates” work-around, as the warning and wording surrounding this is already so strongly discouraging. Users can always do this manually, but we would no longer be facilitating.

ADD REPLY

Login before adding your answer.

Traffic: 615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6