Question: Design matrix and contrast for RNA knockdown experiment
0
2.9 years ago by
maltethodberg130
Sweden
maltethodberg130 wrote:

I intend to use voom+limma to analyse an RNA knockdown experiment that essentially looks like this:

design <- data.frame(gene=c(rep("control", 2), rep("gene1", 6), rep("gene2", 6), rep("gene3", 6)),
oligo=c(rep("control", 2), paste("oligo", c("a", "a", "b", "b", "c", "c",
"d", "d", "e", "e", "f", "f",
"g", "g", "h", "h", "i", "i"))))

For each target gene I have multiple oligos to control for off target effects. In addition I have non-treated controls.

For each target gene, I would like to identify genes that are DE in all the corresponding oligos. I know I can simply do it using a model matrix like this:

mod <- model.matrix(~oligo)

This way I can test for an effect of each oligo separately using the coefficients. Then I can extract all DE genes (i.e. using decideTests) and then only keep genes that are DE in all tested coefficients for each gene.

Is there a more elegant way to express this approach as a single contrast for each gene?

Other suggestions for design and contrast matrices for this type of experiment are also welcome. The actually experiment is considerably bigger, with many more genes and oligos, but this is the general design.

modified 2.9 years ago by Aaron Lun24k • written 2.9 years ago by maltethodberg130
Answer: Design matrix and contrast for RNA knockdown experiment
4
2.9 years ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

For your specific purpose, I don't think there's a more elegant method than what you propose. If you want genes that are DE for each oligo versus the control for a given set of oligos (presumably against the same target gene), then you'll obviously have to test each pairwise oligo-control comparison for DE, and then intersect the results. Other contrasts don't quite do the same job - for example, doing some ANOVA-like comparison would reject the null if any of the oligos were significantly different from the control, and similarly, comparing the average expression of the oligo groups against that of the control doesn't guarantee that each oligo is significant.

So, I would stick with a one-way layout, and perform pairwise comparisons between each of the oligo groups against the control group. As for the intersection itself, there's a couple of ways to do it:

• The quick-and-dirty approach is to take the DE list of each oligo-control comparison at the relevant FDR (e.g., 5%, using decideTests), and then take the intersection of the lists across multiple comparisons.
• The more rigorous approach is to, for each gene, take the maximum p-value across all oligo-control comparisons of interest. This represents the p-value from an intersection-union test (see various work from Berger, e.g., here), and you can then apply the BH method to those p-values to control the FDR across all genes.

The second approach provides a better guarantee of error control, while the first approach is more intuitive - but I'd imagine that both should be fairly similar in your case where there (hopefully) should be strong correlations between oligos targeting the same gene. It's probably wise to check that each knockdown was successful, otherwise your intersection would be empty if any oligo had no DE genes.

Thank you for your detailed reply. I like your suggestion with the intersection union test, since that still produces a p-value for every gene.
To test whether the mean effect across all oligos for a single gene is significant, would the following contrast for the Gene1 be correct?: c(0, 1/3, 1/3, 1/3, 0, 0, 0, ...)?

Yes, that looks right.

Just implemented the intersection union test. In some cases this gives rise to a p-value distribution heavily skewed towards 1.0 (similar to the scenario D here: http://varianceexplained.org/statistics/interpreting-pvalue-histogram/). Any idea of why this happens?

The skew towards 1 is expected and is because the IUT is very conservative. This is necessary to deal with various factors, e.g., correlations between p-values for the individual pairwise comparisons, type II error rates for those comparisons where the individual null hypothesis is false (but the union null across comparisons is still true). More generally, intersection operations are conservative, so the p-value skew just reflects that.

In that case, what would an appropriate FDR correction procedure be? In my understanding, most methods assume a  uniform distribution of p-values towards 1.0, which is not the case here.