I received a question about RIP-Seq by email, and wanted to post a reminder about how to test for ratio of ratios using DESeq2.
Suppose we have two assays: Input and IP, and we have two conditions: A and B.
Then using a design: '~ assay + condition + assay:condition', the interaction term 'assay:condition' represents the ratio of the ratios: (IP for B / Input for B) / (IP for A / Input for A)... which with some algebra you can see is equivalent to (IP for B / IP for A) / (Input for B / Input for A).
Using a likelihood ratio test which removes the interaction term in the reduced model, we test whether the IP enrichment is different in condition B than in A:
dds <- DESeqDataSet(se, design= ~ assay + condition + assay:condition) dds <- DESeq(dds, test="LRT", reduced= ~ assay + condition) results(dds)
(Or similarly, using another DESeqDataSet constructor function).
Note: if there are only 4 samples total (no replicates), then there are not enough residual degrees of freedom to estimate the dispersion, and DESeq2 will automatically use a design of ~ 1 for estimating the dispersion (and will print a message about this). This means that the analysis is underpowered to detect differences, because we are overestimating the dispersion in ignoring the differences due to experimental design. See ?DESeq for more on this case.
what if the depth is lower for the IP samples? This is handled by the size factors in the model.
what if the distribution of counts is different for the IP samples? Suppose first that the counts are still decently modeled with a Negative Binomial. DESeq2 estimates a single dispersion parameter per gene. If the IP samples have a higher dispersion than the Input samples, then the dispersion estimate from all samples will be in-between the dispersion estimate you would get from the assays separately. I would guess that this wouldn't be too problematic, but it's hard to say without seeing the actual data, and will depend on how far apart the dispersions of the two assays are. Secondly, what if the IP counts are be poorly fit with a Negative Binomial? Again, it's hard to say exactly how this would effect the performance without knowing how far off from a Negative Binomial they are.