How Does DESeq2 estimate dispersions?
1
0
Entering edit mode
@likerencn-22267
Last seen 5.1 years ago

We are using DESeq2 to analysis some Ribo-seq data pointwise and try to replicate the dispersion estimation procedure. That is, we focus on the reads on each codon other than total count in gene. Thus there is a large proportion of small counts. We have problem to replicate the Deseq2 methods in our data. Could you help us with them?

First, following Anders and Huber (2010), we had lots of negative raw dispersions at small relative mean q’s. I found in the code of DESeq2, estimated raw dispersions lesser than 0.04 are set to be 0.04. Besides, the formula is different from Anders and Huber (2010). In that paper, raw dispersion is estimated through $(w-z)/q^2$, and in the DESeq2 code, $(w-q)/q^2$ is used instead. And I do not know why the trimmedVariance() function have a scale = 1.51.

Second question is, where is the definition of dispersions()? Why is its result different from parametricDispersionFit(), which is using a robust gamma GLM. I can find it in the package, but I cannot find its definition on the GitHub. Could you please give me some detail clue how dispersions() estimate dispersion?

Thank you for your time and help.

DESeq2 Ribo-Seq • 1.1k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 7 days ago
United States

The best place to understand how dispersion is estimated would be the DESeq2 paper. We first use a MLE on a likelihood adjusted with the Cox Reid term. Then we perform a Bayesian update.

You may be getting confused in the code by some functions that are just used to initiate the fitting algorithm but aren’t the final value (neither MLE not posterior estimate).

ADD COMMENT
0
Entering edit mode

Thank you. Are you talking about this paper: "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2"?

ADD REPLY
1
Entering edit mode

Yes. The Bioc packages each have a citation. Also you may want to scan the DESeq2 vignette which has a lot of details about the software and paper.

ADD REPLY
0
Entering edit mode

Is it possible to remove the integer requirement of DESeqDataSet? That is, I have different abundance sets for different genes, thus I cannot attribute them in the sizeFactors() function.

ADD REPLY
0
Entering edit mode

No the integer requirement is not optional now. We assume you have counts as input. You can however have a matrix of offsets, again, see the vignette for details.

ADD REPLY

Login before adding your answer.

Traffic: 343 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6