Question

EdgeR: Each treatment as a group versus treatment effects over all times?

0

Entering edit mode

Ekarl2 ▴ 80

@ekarl2-7202

Last seen 7.8 years ago

Sweden

From the wonderful edgeR user manual (sections 3.3.1 p. 32 and section 3.3.3 p. 34) we find two different approaches:

(1) each treatment is a different group, and tests are done between groups.

(2) treatment effects over all times (e. g. either drug conc. 1 or drug conc. 2 versus baseline).

The current dataset I am looking at has seven groups:

- 1x control.

- 3x of drug A, 3x of drug B.

...with a batch effect uncorrelated to treatment.

The different groups for the drugs are different concentrations, but the three concentrations for A are not the same as the three concentrations for B.

So far, I have just used approach (1) and compared each of the six experimental groups (3 replicates each, design matrix modeled by treatment group and batch) with the control and gotten six lists of differentially expressed genes.

Is this defensible, or would (2) have been a better choice, one for each drug? What are the pros and cons of each approach?

edgeR limma • 2.1k views

ADD COMMENT • link updated 8.1 years ago by Aaron Lun ★ 28k • written 8.1 years ago by Ekarl2 ▴ 80

score 0 · Answer 1 · 2016-02-29

By your second approach, I assume you're referring to a situation where the drug concentration is treated as a real-valued covariate rather than a factor. Using a real-valued covariate reduces the size of the model, as we now have one coefficient for drug concentration rather than 3 coefficients for each different concentration. This provides more residual d.f. for improved variance estimation. However, this benefit is countered by the need to assume that the gene (log-)expression responds linearly to drug dosage. You could use splines to allow more a non-linear fit, but that will use up residual d.f. and nullify the advantage of the covariate model. In fact, with only three unique concentrations, it'll be equivalent to using a factor-based model anyway.

In short, if you've got enough residual d.f., then stick with the factor-based model. The DE contrasts are also easier to interpret when each concentration is its own factor that can be tested against the control; in contrast, spline coefficients are quite uninterpretable. If you had more (> 10) concentrations, one could make a better argument for using a covariate model with splines, but that's not the case here.