Question

How Does DESeq2 estimate dispersions?

0

Entering edit mode

li.keren.cn • 0

@likerencn-22267

Last seen 5.3 years ago

We are using DESeq2 to analysis some Ribo-seq data pointwise and try to replicate the dispersion estimation procedure. That is, we focus on the reads on each codon other than total count in gene. Thus there is a large proportion of small counts. We have problem to replicate the Deseq2 methods in our data. Could you help us with them?

First, following Anders and Huber (2010), we had lots of negative raw dispersions at small relative mean q’s. I found in the code of DESeq2, estimated raw dispersions lesser than 0.04 are set to be 0.04. Besides, the formula is different from Anders and Huber (2010). In that paper, raw dispersion is estimated through $(w-z)/q^2$, and in the DESeq2 code, $(w-q)/q^2$ is used instead. And I do not know why the trimmedVariance() function have a scale = 1.51.

Second question is, where is the definition of dispersions()? Why is its result different from parametricDispersionFit(), which is using a robust gamma GLM. I can find it in the package, but I cannot find its definition on the GitHub. Could you please give me some detail clue how dispersions() estimate dispersion?

Thank you for your time and help.

DESeq2 Ribo-Seq • 1.2k views

ADD COMMENT • link updated 5.4 years ago by James W. MacDonald 68k • written 5.4 years ago by li.keren.cn • 0

score 1 · Answer 1 · 2019-11-01

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

The best place to understand how dispersion is estimated would be the DESeq2 paper. We first use a MLE on a likelihood adjusted with the Cox Reid term. Then we perform a Bayesian update.

You may be getting confused in the code by some functions that are just used to initiate the fitting algorithm but aren’t the final value (neither MLE not posterior estimate).

ADD COMMENT • link 5.4 years ago Michael Love 43k

0

Entering edit mode

Thank you. Are you talking about this paper: "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2"?

ADD REPLY • link 5.4 years ago li.keren.cn • 0

1

Entering edit mode

Yes. The Bioc packages each have a citation. Also you may want to scan the DESeq2 vignette which has a lot of details about the software and paper.

ADD REPLY • link 5.4 years ago Michael Love 43k

0

Entering edit mode

Is it possible to remove the integer requirement of DESeqDataSet? That is, I have different abundance sets for different genes, thus I cannot attribute them in the sizeFactors() function.

ADD REPLY • link 5.4 years ago li.keren.cn • 0

0

Entering edit mode

No the integer requirement is not optional now. We assume you have counts as input. You can however have a matrix of offsets, again, see the vignette for details.

ADD REPLY • link 5.4 years ago Michael Love 43k