Question

Effect size in non-standard linear model treatment experiment

0

Entering edit mode

Moritz E. Beber ▴ 10

@moritz-e-beber-5727

Last seen 8.5 years ago

European Union

Dear all,

I'm analyzing a large number of microarray treatment experiments. I have modeled the expression level as a dependent variable of the continuous independent variables time and dose as well as their interaction. As an R formula:

~ time + dose + time:dose

Now I would like to estimate the overall effect of the treatment experiment on a gene. A reasonable measure would be the amount of variance explained, i.e., R².

I have seen Standard error and effect size from Limma and Effect size similar to Cohen's d in limma but they deal with simple treatment/control experiments where the group means make up the log fold change and it is easy to compute Cohen's d. In my case I have three coefficients plus the intercept to take into account.

I have found this answer on stats.stackexchange that shows how to convert the F statistic to the R². I'm a bit hesitant, however, because limma uses moderated statistics and I'm not sure about their effect and maybe there is a better way altogether in order to estimate the effect size of the complete model.

Thank you for any insights.

limma microarray effect size • 1.2k views

ADD COMMENT • link updated 8.5 years ago by Aaron Lun ★ 28k • written 8.5 years ago by Moritz E. Beber ▴ 10

0

Entering edit mode

What do you mean by "the overall effect of the treatment experiment"? Do you mean the effect of the dose factor? If you're getting an F-statistic, you must be performing some kind of DE comparison; what's your code?

ADD REPLY • link 8.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

My exact code is at work but it goes something like this: I have a data frame that describes the expression levels. The experiments were measured at 3 time points after application of the drug and 4 different dosage levels. So you could construct it in the following way:

data.frame(
    row.names = sprintf("X%d", 1:12),
    dose = rep(c(0, 16, 32, 64), each = 3),
    time = rep(c(2, 8 , 24), times = 4)
)

The output is then:

    dose time
X1     0    2
X2     0    8
X3     0   24
X4    16    2
X5    16    8
X6    16   24
X7    32    2
X8    32    8
X9    32   24
X10   64    2
X11   64    8
X12   64   24

This is linked to the expression levels which have been mapped to ensembl gene IDs. For each of those unique combinations in the table above I have 3 replicates, so there are actually 36 expression columns.

The questions I want to answer are: (1) Does a gene respond to the treatment at all (at any dosage level)? (2) I want to characterize the strength of the response (hence the effect size) so that I can compare gene response between different drug treatment experiments. I want a single numerical indicator of a gene's response per group of experiments, which is why I used a model that includes time and dose as continuous variables over all points rather than a factorial design.

A different approach would be to use a factorial design and then find a way to summarize the effect size of each contrast such that I obtain one effect size per drug treatment experiment group.

ADD REPLY • link 8.5 years ago Moritz E. Beber ▴ 10

1

Entering edit mode

I'm not sure it makes any sense to have an interaction term for two real-valued covariates. The entry in the design matrix ends up being the product of time and dose for each sample, which has no obvious interpretation. I'd go with a factorial design, you've got enough residual d.f. for it. Set up your contrast matrix to test for any DE between dosages at each time point, and do this for all time points. Alternatively, you could test for differential effects of time between dosages, e.g., if the amount of DE for dose 0 between times 0 and 8 is different that for dose 16. In any case, the F-statistic that you get out of that will represent the (time-matched) effect of treatment.

ADD REPLY • link 8.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

What would your recommendation then be to summarize that information? Simply record the maximum value of the F-statistic over all contrasts?

ADD REPLY • link 8.5 years ago Moritz E. Beber ▴ 10

0

Entering edit mode

Well, you can get a single F-statistic by combining all contrasts into a single matrix.

ADD REPLY • link 8.5 years ago Aaron Lun ★ 28k

score 0 · Answer 1 · 2015-10-08

In lieu of a more informed person giving a better answer; I suspect that there is no easy way to compute an effect size from a moderated F-statistic. The latter is intimately tied to empirical Bayes shrinkage that affects both the estimated variances and the degrees of freedom. Increasing the sample size and the residual d.f. will generally reduce the amount of shrinkage, but it's not obvious how this affects the behaviour of R² (if at all). The easier approach would just be to manually calculate a standard F-statistic for each gene, and use that value for your effect size calculations instead of the moderated value produced by limma.