Custom dispersion estimate DESeq2
Entering edit mode
Sindre ▴ 100
Last seen 11 months ago

I am curious, for an example design with a lot of conditions, say a control group and a disease group, and two different treatments performed on both groups (eg. pre-treatment, post-treatment 1 and post-treatment 2 values for both the control and disease group).

Let's say dispersion is very different from one condition to another; its higher in the disease group than in the control group and very high in samples after treatment 1 and extremely high after treatment 2. Is it a valid option to supply a custom dispersion estimate calculated only from the control group pre-treatment?

deseq2 edger • 152 views
Entering edit mode
Last seen 19 hours ago
United States

I'll just say, as a matter of software, DESeq2 does not have any support for separate dispersion estimates across group.

Entering edit mode
Aaron Lun ★ 27k
Last seen 19 hours ago
The city by the bay

Is it a valid option to supply a custom dispersion estimate calculated only from the control group pre-treatment?

Most certainly not. The variability in the treatments is real, dismissing it would be dangerous.

The unsaid question (that Mike touched on) is whether different dispersions are supported for each group. In the distant past, I added some functionality to edgeR to accept a matrix of dispersions - see, for example, the description of the dispersion= argument in glmFit(). (To be honest, I don't quite remember why I did this; it was probably something single-cell-related, and I haven't used it since.) This means that you could set up a matrix where, for each gene, all observations from the same group get one dispersion value and all observations in another group get another dispersion.

So it's possible, but that really just kicks the can down the road because you're faced with the problem of trying to estimate these group-specific dispersions. This is... also theoretically possible with the QL machinery in edgeR, but it would involve some experimentation. If you're curious, the general idea would be to (i) split the dataset into each group, (ii) run estimateDisp() on each subset of samples; (iii) cbind the trended dispersions together into a matrix, (iv) feed that matrix into glmQLFit() and (v) hope for the best. Don't treat that as a recommendation, though; I have no idea how or if it will work out.

My standard approach for dealing with this situation would be to use voomWithQualityWeights().

P.S. I just noticed the title. If this is meant to be a DESeq2 question, are you just tagging the edgeR maintainers for fun? I'm not sure I like that.


Login before adding your answer.

Traffic: 481 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6