Question

How do I turn off dispersion in DESeq2?

0

Entering edit mode

hans • 0

@hans-9735

Last seen 10.0 years ago

In DESeq (1, as it were) there's a way to "turn off" dispersion. Can this be done in DESeq2 as well? Might be helpful when sample sizes are large (I have a data set with 66 samples) and dispersion may mask some "real" variation..

Thanks

hans hofmann

deseq2 dispersion • 1.6k views

ADD COMMENT • link updated 10.0 years ago by Michael Love 43k • written 10.0 years ago by hans • 0

score 0 · Answer 1 · 2016-02-16

I assume you mean the sharing of information about dispersion across genes, and instead use the per-gene dispersion values. Of course, you absolutely need to have some kind of estimate of dispersion, as the variance of counts across biological replicates from the expected value is often much higher than what you would expect with a Poisson random variable (dispersion=0).

It is possible to use per gene dispersion like so:

dds <- estimateSizeFactors(dds)
dds <- estimateDispersionsGeneEst(dds)
dds <- estimateDispersionsFit(dds) # this is needed for some internal stuff 
dispersions(dds) <- mcols(dds)$dispGeneEst
dds <- nbinomWaldTest(dds)

When would you want to do this? I wouldn't actually recommend it for general use. Unlike DESeq (1), DESeq2 uses a posterior mode as the dispersion estimator, which means that the estimator converges to the per-gene value as the sample size increases. This is accomplished in a statistically "principled" way (following Bayes theorem, and using the empirical prior estimated across all genes). So just running DESeq() should take care of everything for you. But you are free to try it out using the above code.

Why use DESeq2 when you have many samples? You could also use a weighted linear model (e.g., limma-voom), and I do in fact recommend this for users when there are 100s of samples. Another benefit of using DESeq2 is the moderated estimates of log fold change, when there are small to medium number of samples. If you want to use the entire vector of LFC for visualization (plotting LFC across experiments) or downstream analysis, we observed a benefit to the moderated LFC up to samples sizes of 20 (10 per group). Note that, filtering on adjusted p-value, moderated LFC and pseudocount-based LFC are likely to have nearly the same MSE. And this result depends on the distribution of true LFC.

[1] Supp Fig 13-14 http://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8
[2] Fig 1 https://bioinformatics.oxfordjournals.org/content/30/23/3424.full