So I have a biological question that seems best understood as a change in ratio of two sequenced products between two conditions. So given a design of
counts ~ condition the null hypothesis is that for each gene
A and a fixed reference
log(mu_A_condition1) - log(mu_B_condition1) >= log(mu_A_condition2) - log(mu_B_condition2). This is partially derived from the need to treat the data as compositional, but retain the neat noise model of DESeq2, because just taking the log ratios and doing a linear regression (as in CoDA) seems to underestimate the noise and discard the fact that low counts are more noisy than high counts.
My understanding is that this null hypothesis can actually be tested with DESeq2 with a little tweak, because with dummy coding it translates to
beta_A_condition2 - beta_B_condition2 <= 0, so I can get the maximum likelihood estimate by just subtracting the two
beta values given by DESeq2, get the SE by taking
sqrt(beta_A_SE^2 + beta_B_SE_^2) (as the betas should have 0 covariance, right?) which is all I need to compute the Wald statistic and then I can once again carry on the same way DESeq2 does (p-adjustment, shrinkage, ...)
The question is:
- Does this seem sensible?
- What are some reasons this might be a bad idea?
- Are you aware of some work already using this approach?
- Is there some better alternative to this?
The only place I found this question asked previously is at https://www.biostars.org/p/470418/ where it got no answers.
Note that despite also being related to RIP-seq, this is a different question than DESeq2 testing ratio of ratios (RIP-Seq, CLIP-Seq, ribosomal profiling) as I do not want to compare ratio of ratios (IP/input) for the same gene between two conditions but rather want to get ratio of ratios (gene A / gene B) between IP and input.
Thanks for developing and maintaining DESeq2 - it is a great and robust tool!