Question

edgeR: glmLRT equivalence between tests with cell means model and factor levels model

0

Entering edit mode

mmclean • 0

@mmclean-7756

Last seen 9.0 years ago

United States

I'm analyzing a 2^3 factorial design with replication with edgeR and am fitting cell means models to test differences in treatment means and also fitting factor levels models to test main effects and interactions. I am finding that results for the tests for zero main effects give the exact same results as certain tests for the equality of two treatment means. This does not make sense to me and I can't find a mistake in my code. A reproducible example follows.

library(edgeR)
n <- 10000
samps <- c("cca1","cca2","cca3","cca4","cca5","cca6","cpa1","cpa2","cpa3","cpa4",
           "cpa5","cpa6","fca1","fca2","fca3","fca4","fca5","fca6","fpa1","fpa2",
           "fpa3","fpa4","fpa5","fpa6","ccs1","ccs2","ccs3","ccs4","ccs5","ccs6",
           "cps1","cps2","cps3","cps4","cps5","cps6","fcs1","fcs2","fcs3","fcs4",
           "fcs5","fcs6","fps1","fps2","fps3","fps4","fps5","fps6")
n.chips <- length(samps)
r.counts <- matrix(rpois(n*n.chips, 10), n, n.chips)
colnames(r.counts) <- samps
f1 <- factor(substr(colnames(r.counts), 1, 1))
f2 <- factor(substr(colnames(r.counts), 2, 2))
f3 <- factor(substr(colnames(r.counts), 3, 3))
f3 <- relevel(f3, "s")

group <- factor(paste(f1, f2, f3, sep = "."))
design.cm <- model.matrix(~0+group)  # cell means model
design.fl <- model.matrix(~f1*f2*f3)  # factor levels model,

dat.new <- DGEList(counts=r.counts, group=group)
dat.new <- calcNormFactors(dat.new, method = "RLE")

dat.cm <- estimateGLMCommonDisp(dat.new, design.cm, verbose=TRUE)

dat.fl <- estimateGLMCommonDisp(dat.new, design.fl, verbose=TRUE)

fit.cm <- glmFit(dat.cm, design.cm)
fit.fl <- glmFit(dat.fl, design.fl)
colnames(fit.cm$coef)
colnames(fit.fl$coef)
lrt.cm <- glmLRT(fit.cm, contrast = c(1, -1, 0, 0, 0, 0, 0, 0))
lrt.fl <- glmLRT(fit.fl, coef = 4)
all.equal(lrt.cm$table, lrt.fl$table)

lrt.cm <- glmLRT(fit.cm, contrast = c(0, -1, 0, 1, 0, 0, 0, 0))
lrt.fl <- glmLRT(fit.fl, coef = 3)
all.equal(lrt.cm$table, lrt.fl$table)

lrt.cm <- glmLRT(fit.cm, contrast = c(0, -1, 0, 0, 0, 1, 0, 0))
lrt.fl <- glmLRT(fit.fl, coef = 2)
all.equal(lrt.cm$table, lrt.fl$table)

Can someone explain why this is happening?

edger anova • 1.5k views

ADD COMMENT • link updated 9.0 years ago by Aaron Lun ★ 28k • written 9.0 years ago by mmclean • 0

score 3 · Accepted Answer · 2015-05-17

Let's focus on the meaning of the coefficients in your interaction model. The intercept represents the expression of the c.c.a group for libraries 25 - 30, based on the fact that the intercept is the only non-zero term in the corresponding rows of design.fl. Similarly, coefficient 4 represents the difference between the c.c.s and c.c.a groups, as libraries 1 - 6 are the only rows that have non-zero terms for just the intercept and coefficient 4. While this coefficient is nominally involved in the systematic component of other groups, it will have no practical effect as changes in expression for those other groups will be absorbed by interaction terms. So, dropping coefficient 4 would give you a comparison between those two groups, exactly as you have specified in your first lrt.cm.

The same logic can be applied to all the other comparisons, where dropping each coefficient corresponds to a comparison of some other group to c.c.a, i.e., the intercept. For example, coefficient 2 represents the increase in expression over the intercept for libraries 37 - 42 (f.c.s) while coefficient 3 represents the increase in expression for libraries 31 - 36 (c.p.s). In general, testing for "main effects" in your interaction model is not interpretable unless you are sure that the interaction effect is not significant.

I find one-way layouts (i.e., the cell means model) much easier to interpret. If you want to test for some overall effect, e.g., of f1f, you can set up a comparison like this:

con <- makeContrasts((f.c.a+f.c.s+f.p.a+f.p.s)/4 - (c.c.a+c.c.s+c.p.a+c.p.s)/4, levels=design.cm)

This will give you all genes where the average expression across all groups with f differs from that across all groups with c. Alternatively, you could compare matching groups (e.g., f.c.a with c.c.a, f.c.s with c.c.s, and so on) and combine the results, either by using an ANOVA-like comparison or by running each comparison separately and intersecting the final DE lists. An ANOVA-like comparison is shown below:

con <- makeContrasts(f.c.a-c.c.a, f.c.s-c.c.s, f.p.a-c.p.a, f.p.s-c.p.s, levels=design.cm)