Question

edgeR constrast drug treatments and time points

0

Entering edit mode

sergio.martinezcuesta ▴ 10

@sergiomartinezcuesta-9159

Last seen 3.6 years ago

United Kingdom

Dear all,

I need some feedback for defining contrasts in edgeR.

We are using sequencing to count shRNAs in a screen aiming to understand the effect of drug treatment in one cell line. We work with two time points (t0 and t1) and four drug treatment levels (untreated, solvent, drug1 and drug2). The experimental setup grouped all time points and treatments in four groups (three replicates per group): t0_untreated, t1_solvent, t1_drug1 and t1_drug2. Solvent is the media used to dissolve the drugs for t1_drug1 and t1_drug2.

The rows of our count table are all shRNAs and 12 columns corresponding to 4 groups * 3 replicates so that my design matrix used to estimate common dispersion and fit a negative binomial GLM looks as follows:

              t0_untreated t1_solvent t1_drug1 t1_drug2
t0_untreated_1           1        0       0           0
t0_untreated_2           1        0       0           0
t0_untreated_3           1        0       0           0
t1_solvent_1        0        1       0           0
t1_solvent_2        0        1       0           0
t1_solvent_3        0        1       0           0
t1_drug1_1         0        0       1           0
t1_drug1_2         0        0       1           0
t1_drug1_3         0        0       1           0
t1_drug2_1     0        0       0           1
t1_drug2_2     0        0       0           1
t1_drug2_3     0        0       0           1

We want to find the following:

(a) shRNAs specific to the t1_drug1 and t1_drug2 groups individually in comparison to t0_untreated and t1_solvent.

(b) shRNAs overlapping between t1_drug1 and t1_drug2 groups when compared to t0_untreated and t1_solvent.

So far I have used makeContrasts to define three contrasts:

contrast1 = t1_solvent - t0_untreated
contrast2 = t1_drug1 - t0_untreated
contrast3 = t1_drug2 - t0_untreated

Then used glmLRT for each contrast individually, then topTags fixing a FDR threshold (e.g. 10^(-3)), which would give me shRNAs for questions (a) and (b) above. E.g. drug1 specific shRNAs would be obtained from the hits in for contrast2 excluding shared hits with contrast1 and contrast3 at the same FDR threshold.

However I am wondering whether it would be possible to define other contrasts in order to answer (a) and (b) directly. Here are some ideas, would any of the following make sense for you?

One option for (a): specific shRNAs for drug1 and drug2 respectively:

contrast4 = (t1_drug1 - t0_untreated) - (t1_drug2 - t0_untreated) - (t1_solvent - t0_untreated)
contrast5 = (t1_drug2 - t0_untreated) - (t1_drug1 - t0_untreated) - (t1_solvent - t0_untreated)

A second option for (a):

contrast6 = t1_drug1 - 1/3*t1_drug2 - 1/3*t1_solvent - 1/3*t0_untreated
contrast7 = t1_drug2 - 1/3*t1_drug1 - 1/3*t1_solvent - 1/3*t0_untreated

In the same line of thought, for (b):

contrast8 = t1_drug1 + t1_drug2 - t1_solvent - t0_untreated

Any ideas will be useful.

Thanks,

Sergio

edgeR contrast • 1.1k views

ADD COMMENT • link updated 7.5 years ago by Aaron Lun ★ 28k • written 7.5 years ago by sergio.martinezcuesta ▴ 10

score 0 · Answer 1 · 2016-11-01

For the questions you're asking, they can't be easily bundled into into a single contrast. In (a), you're asking for genes that are DE between drug 1 and solvent/untreated, but not DE between drug 2 and solvent/untreated. The null hypothesis can't be expressed here as a set of equations between coefficients because the null for non-DE is an inequality. The same applies for (b), where the null hypothesis allows for either the drug 1 or 2 comparison to be non-DE. In general, edgeR and glmLRT needs to express the null as a set of equations (putting glmTreat aside, as it does something else), so it won't be able to handle these nulls. As a result, unfortunately, your proposed contrasts don't make much sense with respect to your questions of interest.

The easiest solution is to do pairwise comparisons and then intersect the results. For (a), I would identify genes that are significant in contrast 2 or 3 (at a FDR threshold of say, 5%), and then intersect them with the genes that are definitely not significant in the other two contrasts (say, at a p-value threshold of 0.2). It's better to use a p-value threshold for the latter because a large FDR might just be due to the severity of the correction rather than lack of evidence. Similarly, for (b), I would identify genes that are significant in both contrasts 2 and 3, and intersect them with genes that are not significant in contrast 1. Yes, this is a bit ad hoc, but that's probably better than trying to formally test for non-DE'ness (which is very conservative and a bit arbitrary).