Question

Design matrix and contrast for RNA knockdown experiment

0

Entering edit mode

maltethodberg ▴ 180

@maltethodberg-9690

Last seen 11 days ago

Denmark

I intend to use voom+limma to analyse an RNA knockdown experiment that essentially looks like this:

design <- data.frame(gene=c(rep("control", 2), rep("gene1", 6), rep("gene2", 6), rep("gene3", 6)),
                     oligo=c(rep("control", 2), paste("oligo", c("a", "a", "b", "b", "c", "c",
                                                                  "d", "d", "e", "e", "f", "f",
                                                                  "g", "g", "h", "h", "i", "i"))))

For each target gene I have multiple oligos to control for off target effects. In addition I have non-treated controls.

For each target gene, I would like to identify genes that are DE in all the corresponding oligos. I know I can simply do it using a model matrix like this:

mod <- model.matrix(~oligo)

This way I can test for an effect of each oligo separately using the coefficients. Then I can extract all DE genes (i.e. using decideTests) and then only keep genes that are DE in all tested coefficients for each gene.

Is there a more elegant way to express this approach as a single contrast for each gene?

Other suggestions for design and contrast matrices for this type of experiment are also welcome. The actually experiment is considerably bigger, with many more genes and oligos, but this is the general design.

design matrix limma contrast matrix • 1.4k views

ADD COMMENT • link updated 8.5 years ago by Aaron Lun ★ 28k • written 8.5 years ago by maltethodberg ▴ 180

score 4 · Accepted Answer · 2016-10-13

4

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 3 minutes ago

The city by the bay

For your specific purpose, I don't think there's a more elegant method than what you propose. If you want genes that are DE for each oligo versus the control for a given set of oligos (presumably against the same target gene), then you'll obviously have to test each pairwise oligo-control comparison for DE, and then intersect the results. Other contrasts don't quite do the same job - for example, doing some ANOVA-like comparison would reject the null if any of the oligos were significantly different from the control, and similarly, comparing the average expression of the oligo groups against that of the control doesn't guarantee that each oligo is significant.

So, I would stick with a one-way layout, and perform pairwise comparisons between each of the oligo groups against the control group. As for the intersection itself, there's a couple of ways to do it:

The quick-and-dirty approach is to take the DE list of each oligo-control comparison at the relevant FDR (e.g., 5%, using decideTests), and then take the intersection of the lists across multiple comparisons.
The more rigorous approach is to, for each gene, take the maximum p-value across all oligo-control comparisons of interest. This represents the p-value from an intersection-union test (see various work from Berger, e.g., here), and you can then apply the BH method to those p-values to control the FDR across all genes.

The second approach provides a better guarantee of error control, while the first approach is more intuitive - but I'd imagine that both should be fairly similar in your case where there (hopefully) should be strong correlations between oligos targeting the same gene. It's probably wise to check that each knockdown was successful, otherwise your intersection would be empty if any oligo had no DE genes.

ADD COMMENT • link 8.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you for your detailed reply. I like your suggestion with the intersection union test, since that still produces a p-value for every gene.
To test whether the mean effect across all oligos for a single gene is significant, would the following contrast for the Gene1 be correct?: c(0, 1/3, 1/3, 1/3, 0, 0, 0, ...)?

ADD REPLY • link 8.5 years ago maltethodberg ▴ 180

0

Entering edit mode

Yes, that looks right.

ADD REPLY • link 8.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

Just implemented the intersection union test. In some cases this gives rise to a p-value distribution heavily skewed towards 1.0 (similar to the scenario D here: http://varianceexplained.org/statistics/interpreting-pvalue-histogram/). Any idea of why this happens?

ADD REPLY • link 8.4 years ago maltethodberg ▴ 180

0

Entering edit mode

The skew towards 1 is expected and is because the IUT is very conservative. This is necessary to deal with various factors, e.g., correlations between p-values for the individual pairwise comparisons, type II error rates for those comparisons where the individual null hypothesis is false (but the union null across comparisons is still true). More generally, intersection operations are conservative, so the p-value skew just reflects that.

ADD REPLY • link 8.4 years ago Aaron Lun ★ 28k

0

Entering edit mode

In that case, what would an appropriate FDR correction procedure be? In my understanding, most methods assume a uniform distribution of p-values towards 1.0, which is not the case here.

ADD REPLY • link 8.4 years ago maltethodberg ▴ 180

0

Entering edit mode

I don't think that's a problem. The BH method will just be more conservative if the p-value distribution is skewed towards 1 under the (union) null hypothesis. All other things being equal, you'll reject fewer of the true nulls than you would have if the p-value distribution were uniform, because all the p-values for the true nulls would be larger at the same quantiles of the distribution. This results in a FDR lower than the nominal threshold - not ideal for power, but not a disaster either, given that we're erring on the side of conservativeness.

ADD REPLY • link 8.4 years ago Aaron Lun ★ 28k