Question

Custom dispersion estimate DESeq2

0

Entering edit mode

Sindre ▴ 110

@sindre-6193

Last seen 3.7 years ago

I am curious, for an example design with a lot of conditions, say a control group and a disease group, and two different treatments performed on both groups (eg. pre-treatment, post-treatment 1 and post-treatment 2 values for both the control and disease group).

Let's say dispersion is very different from one condition to another; its higher in the disease group than in the control group and very high in samples after treatment 1 and extremely high after treatment 2. Is it a valid option to supply a custom dispersion estimate calculated only from the control group pre-treatment?

deseq2 edger • 514 views

ADD COMMENT • link updated 3.6 years ago by Aaron Lun ★ 28k • written 3.6 years ago by Sindre ▴ 110

score 0 · Answer 1 · 2020-08-27

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 8 hours ago

United States

I'll just say, as a matter of software, DESeq2 does not have any support for separate dispersion estimates across group.

ADD COMMENT • link 3.6 years ago Michael Love 41k

score 0 · Answer 2 · 2020-08-28

Is it a valid option to supply a custom dispersion estimate calculated only from the control group pre-treatment?

Most certainly not. The variability in the treatments is real, dismissing it would be dangerous.

The unsaid question (that Mike touched on) is whether different dispersions are supported for each group. In the distant past, I added some functionality to edgeR to accept a matrix of dispersions - see, for example, the description of the dispersion= argument in glmFit(). (To be honest, I don't quite remember why I did this; it was probably something single-cell-related, and I haven't used it since.) This means that you could set up a matrix where, for each gene, all observations from the same group get one dispersion value and all observations in another group get another dispersion.

So it's possible, but that really just kicks the can down the road because you're faced with the problem of trying to estimate these group-specific dispersions. This is... also theoretically possible with the QL machinery in edgeR, but it would involve some experimentation. If you're curious, the general idea would be to (i) split the dataset into each group, (ii) run estimateDisp() on each subset of samples; (iii) cbind the trended dispersions together into a matrix, (iv) feed that matrix into glmQLFit() and (v) hope for the best. Don't treat that as a recommendation, though; I have no idea how or if it will work out.

My standard approach for dealing with this situation would be to use voomWithQualityWeights().

P.S. I just noticed the title. If this is meant to be a DESeq2 question, are you just tagging the edgeR maintainers for fun? I'm not sure I like that.