Question: Running DESeq2 with gene-dependent binary variable such as membership in pathway?
gravatar for yuri.pritykin
11 months ago by
New York
yuri.pritykin0 wrote:

Is it possible to run DESeq2 with gene-dependent binary variable such as membership in a pathway or transcription factor association? If not, in principle is it possible to modify functions fitNbinomGLMs, fitBeta, etc. so that it becomes possible in a meaningful way, or I miss some issue that doesn't allow this?

ADD COMMENTlink modified 11 months ago by indradhawa15620 • written 11 months ago by yuri.pritykin0

To me at least, it's unclear what could be achieved by this.  Membership of pathway is usually dealt with by something akin to GSEA after generating a statistic via DESeq2.  The only other thing I can of is subsetting your data by your binary variable, but there would be very little difference (if any) between doing this subsetting before or after the running of DESeq, so it seems a little pointless.  If you're talking about a gene-and-sample dependent variable, then again we'd need to see what you're trying to achieve by introducing such a variable.  Otherwise your gene-dependent variable is of necessity constant across samples, and therefor has no part to play in DESeq2's approach.

ADD REPLYlink written 11 months ago by Gavin Kelly560

Thank you for the opinion. The idea is to compare groups of genes with different value of the introduced variable. I agree that making a variable also sample-dependent (which in principle makes sense e.g. for TF binding) will be more interesting. In addition, some kind of regularization over the coefficients of the new gene-dependent variable may make sense, but then it's going to be even more distant from what DESeq2 is now. I'll try to think more about it.

ADD REPLYlink written 11 months ago by yuri.pritykin0

And regarding your idea that subsetting the data by the binary variable and then running DESeq2 has little difference with running DESeq2 first and then subsetting: do I understand correctly that for fitting GLM for each gene it is probably indeed little difference, but for estimating the dispersion it's really important what set of genes goes into the estimation?

ADD REPLYlink written 11 months ago by yuri.pritykin0
gravatar for Wolfgang Huber
11 months ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:

Using gene-dependent covariates can make sense - not for DESeq2 per se, but for the subsequent multiple testing analysis. Have a look at the IHW package and its associated paper. In this way, you can take advantage of the fact that the prior probability of being differentially expressed is quite different between the groups, or if you have reason to believe that the tests' power is different.

In fact, this applies not only to multiple testing analyses following DESeq2, but following any method that conducts multiple tests.

ADD COMMENTlink modified 11 months ago • written 11 months ago by Wolfgang Huber13k

Good, thank you for the link!

ADD REPLYlink written 11 months ago by yuri.pritykin0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 327 users visited in the last hour