Question

Running DESeq2 with gene-dependent binary variable such as membership in pathway?

0

Entering edit mode

yuri.pritykin • 0

@yuripritykin-14450

Last seen 6.4 years ago

New York

Is it possible to run DESeq2 with gene-dependent binary variable such as membership in a pathway or transcription factor association? If not, in principle is it possible to modify functions fitNbinomGLMs, fitBeta, etc. so that it becomes possible in a meaningful way, or I miss some issue that doesn't allow this?

deseq2 • 1.2k views

ADD COMMENT • link updated 6.4 years ago by indradhawa1562 • 0 • written 6.4 years ago by yuri.pritykin • 0

1

Entering edit mode

To me at least, it's unclear what could be achieved by this. Membership of pathway is usually dealt with by something akin to GSEA after generating a statistic via DESeq2. The only other thing I can of is subsetting your data by your binary variable, but there would be very little difference (if any) between doing this subsetting before or after the running of DESeq, so it seems a little pointless. If you're talking about a gene-and-sample dependent variable, then again we'd need to see what you're trying to achieve by introducing such a variable. Otherwise your gene-dependent variable is of necessity constant across samples, and therefor has no part to play in DESeq2's approach.

ADD REPLY • link 6.4 years ago Gavin Kelly ▴ 680

0

Entering edit mode

Thank you for the opinion. The idea is to compare groups of genes with different value of the introduced variable. I agree that making a variable also sample-dependent (which in principle makes sense e.g. for TF binding) will be more interesting. In addition, some kind of regularization over the coefficients of the new gene-dependent variable may make sense, but then it's going to be even more distant from what DESeq2 is now. I'll try to think more about it.

ADD REPLY • link 6.4 years ago yuri.pritykin • 0

0

Entering edit mode

And regarding your idea that subsetting the data by the binary variable and then running DESeq2 has little difference with running DESeq2 first and then subsetting: do I understand correctly that for fitting GLM for each gene it is probably indeed little difference, but for estimating the dispersion it's really important what set of genes goes into the estimation?

ADD REPLY • link 6.4 years ago yuri.pritykin • 0

score 1 · Answer 1 · 2017-11-23

1

Entering edit mode

Wolfgang Huber ★ 13k

@wolfgang-huber-3550

Last seen 15 days ago

EMBL European Molecular Biology Laborat…

Using gene-dependent covariates can make sense - not for DESeq2 per se, but for the subsequent multiple testing analysis. Have a look at the IHW package and its associated paper. In this way, you can take advantage of the fact that the prior probability of being differentially expressed is quite different between the groups, or if you have reason to believe that the tests' power is different.

In fact, this applies not only to multiple testing analyses following DESeq2, but following any method that conducts multiple tests.