Question: What is the correct contrast matrix if one treatment must be compared to both of the other two treatments, but separately?
1
3.1 years ago by
Peter0
United Kingdom
Peter0 wrote:

I have RNA-seq data from a 3-level 1-factorial experiment: non-treated control, placebo-treated negative control, and treated cells. I have 3 replicates for each.

After running edgeR, and also looking at expression levels, I noticed that the negative control is not a good control, as it has DE genes compared to non-treated, while the treatment is not DE for these genes (has the same expression levels).

So to be safe, I want to find genes that are DE compared to non-treated and also for negative control.

How should the contrast matrix be designed?

I suspect that the solution is to simply take the intersect of separately calculated DE gene lists (instead of building it into the contrast).

My design matrix:
groups <- factor(c(0,0,0,1,1,1,2,2,2)) design <- model.matrix(~ 0 + groups) colnames(design) <- c("O", "N", "T")

modified 3.1 years ago by Gavin Kelly560 • written 3.1 years ago by Peter0
Answer: What is the correct contrast matrix if one treatment must be compared to both of
2
3.1 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.3k wrote:

You are correct. The solution is to perform the individual contrasts separately and then take the intersection.

Answer: What is the correct contrast matrix if one treatment must be compared to both of
1
3.1 years ago by
Gavin Kelly560
United Kingdom / London / Francis Crick Institute
Gavin Kelly560 wrote:

I agree with Ryan's support of the 'intersection' approach you suggested, in cases where you wish your effect to be distinguishable from both types of control, but you'll be subject to two rounds of exposure to statistical errors, with the subsequent loss of power.  I think it might be worth looking at other approaches also, depending on the definition of 'placebo' here.  If the placebo is inducing some partially confounding effect (a scrambled siRNA vector,...) then that is the true control, as your three conditions are then O=baseline, N=baseline+placebo_effect, T=baseline+placebo_effect+biological effect, and N-T gives you the straight biological effect, and you can effectively ignore the untreated group (apart from it's contribution to the estimation of within-group noise).  Eliminated things that don't have an O vs T significant effect could in this situation decreases your power and potentially introduces bias against genes where the biological effect is working to counteract the placebo effect (only the experimenter will know if a gene that is upregulated in response to placebo, and reverts to baseline on full treatment is interesting or not).

At the other end of the spectrum, you could actually pool your two types of control groups <- factor(c(0,0,0,0,0,0,2,2,2)) so that you'd be including any placebo effect as part of the 'replicate' variability, and anything that survives this increase in variability is a sufficiently large biological effect that it dwarfs any placebo-induced variability.

When I get vaguely-specified controls, I tend to apply all three approaches, and draw the expression profiles of genes that are significant in some but not all of the approaches - the experimenter then has a clear visualisation of the different hypotheses being tested.

Thank you for the comment, it's true that there are two rounds of error, although I think a more profound loss arises from not including the genes that are not DE against both controls.

Also yes, there is the danger that the intersect approach ignores effects countering the placebo effect. However, the genes in question have basically the same expression level in O vs T, so that makes me doubting the above.

As for pooling, I'll probably try it, but the increased variability means I lose genes that are DE in T vs O or N, but also have big difference in N vs O. (Samples within the 3 groups have very similar expression.) And also those that have opposite fold changes -- although good question whether those should be included in the original intersect approach anyway.