I intend to use voom+limma to analyse an RNA knockdown experiment that essentially looks like this:
design <- data.frame(gene=c(rep("control", 2), rep("gene1", 6), rep("gene2", 6), rep("gene3", 6)), oligo=c(rep("control", 2), paste("oligo", c("a", "a", "b", "b", "c", "c", "d", "d", "e", "e", "f", "f", "g", "g", "h", "h", "i", "i"))))
For each target gene I have multiple oligos to control for off target effects. In addition I have non-treated controls.
For each target gene, I would like to identify genes that are DE in all the corresponding oligos. I know I can simply do it using a model matrix like this:
mod <- model.matrix(~oligo)
This way I can test for an effect of each oligo separately using the coefficients. Then I can extract all DE genes (i.e. using decideTests) and then only keep genes that are DE in all tested coefficients for each gene.
Is there a more elegant way to express this approach as a single contrast for each gene?
Other suggestions for design and contrast matrices for this type of experiment are also welcome. The actually experiment is considerably bigger, with many more genes and oligos, but this is the general design.
Thank you for your detailed reply. I like your suggestion with the intersection union test, since that still produces a p-value for every gene.
To test whether the mean effect across all oligos for a single gene is significant, would the following contrast for the Gene1 be correct?: c(0, 1/3, 1/3, 1/3, 0, 0, 0, ...)?
Yes, that looks right.
Just implemented the intersection union test. In some cases this gives rise to a p-value distribution heavily skewed towards 1.0 (similar to the scenario D here: http://varianceexplained.org/statistics/interpreting-pvalue-histogram/). Any idea of why this happens?
The skew towards 1 is expected and is because the IUT is very conservative. This is necessary to deal with various factors, e.g., correlations between p-values for the individual pairwise comparisons, type II error rates for those comparisons where the individual null hypothesis is false (but the union null across comparisons is still true). More generally, intersection operations are conservative, so the p-value skew just reflects that.
In that case, what would an appropriate FDR correction procedure be? In my understanding, most methods assume a uniform distribution of p-values towards 1.0, which is not the case here.
I don't think that's a problem. The BH method will just be more conservative if the p-value distribution is skewed towards 1 under the (union) null hypothesis. All other things being equal, you'll reject fewer of the true nulls than you would have if the p-value distribution were uniform, because all the p-values for the true nulls would be larger at the same quantiles of the distribution. This results in a FDR lower than the nominal threshold - not ideal for power, but not a disaster either, given that we're erring on the side of conservativeness.