Question

Automatically generate contrasts from a model matrix / formula

0

Entering edit mode

Jonathan ▴ 10

@c31cf0e5

Last seen 7 months ago

United States

I wonder whether there is a function/tool/package that will generate a contrast matrix from a model-matrix or a formula, without manually specifying the compared groups.
For example, assuming 3 groups A, B, C and given a formula such as ~0 + Group + Age + Sex and/or its generated model.matrix(), I'm looking for a way to generate a contrast matrix such as

	GroupAvsB	GroupAvsC	GroupBvsC
GroupA	1	1	0
GroupB	-1	0	1
GroupC	0	-1	-1
Age	0	0	0
Sex	0	0	0

This can be even more complicated if a formula has interacting terms such as ~ 0 + Group*Sex + Age, or when there is a substantial number of groups.

The CRAN package emmeans is doing a similar parsing of a model:

pigs.lm <- lm(log(conc) ~ source + factor(percent), data = pigs)
pairwise.table <- emmeans(pigs.lm, pairwise ~ source)[[2]]
coef(pairwise.table) %>% magrittr::set_colnames(c(colnames(.)[1], summary(pairwise.table)$contrast))

#>      source fish - soy fish - skim soy - skim
#> fish   fish          1           1          0
#> soy     soy         -1           0          1
#> skim   skim          0          -1         -1

limma • 1.1k views

ADD COMMENT • link updated 15 months ago by ATpoint ★ 5.0k • written 15 months ago by Jonathan ▴ 10

score 1 · Answer 1 · 2024-10-15

I am very concerned that you are attempting to extract pairwise comparisons from an interaction formula like 0 + Group*Sex + Age, because doing so is not statistically meaningfull. It is not a statistically correct analysis. You cannot analyse "main effects" in an interaction model without resolving the interactions first.

I have explained before on this forum that factorial interaction models are not usually helpful in genomic analyses. In your example, a much more biologically meaningful approach is usually to convert Group and Sex into a factor with 6 levels and to extract Sex-specific group comparisons. The limma makeContrasts() function makes this sort of analysis explicit and straightforward. By contrast, the statistical concepts of "main effects" do not correspond to anything that could be published in a good medical journal.

makeContrasts()` requires explicit contrasts because I think it is helpful for users to understand the comparisons they are making rather than extracting comparisons they might not understand from a complex model.

limma allows users to fit completely general linear models, more general than can be generated using R's model formulas. In general, it is not possible to predict what contrasts will be meaningful for the user, so the idea of automatically generating contrasts is simply not possible. Analyses need to be designed with the scientific hypotheses in mind. It is not the case that there is a standard universal analysis for each model formula.

In the simple case of an additive Group variable, it would be possible to automatically generate pairwise comparisons but I have been reluctant to do so. In the three group comparison you mention at the start of your question, it is the quite easy to specify the contrasts using makeContrasts() and the resulting code would be shorter and easier to understand than the emmeans code that you show. If the Group has a substantial number of levels, then I don't really want to encourage pairwise comparisons because I think that will generate a confusing analysis that is hard to interpret. I'd rather force users to think about what comparisons are most important, or perhaps do an F-test.

score 1 · Answer 2 · 2024-10-16

You only need a few lines of code to make all pairwise unique contrasts automatically. This assumes that you are using syntactically valid names all the way.

group <- c("A", "A", "B", "B", "C", "C")
sex <- c("m", "f", "m", "f", "m", "f")
age <- rnorm(6, 30, 5)

design <- model.matrix(~0 + group + sex + age + sex:age)
colnames(design) <- gsub("^group", "", colnames(design))
design
#>   A B C sexm      age sexm:age
#> 1 1 0 0    1 35.86183 35.86183
#> 2 1 0 0    0 26.52460  0.00000
#> 3 0 1 0    1 39.94285 39.94285
#> 4 0 1 0    0 35.35639  0.00000
#> 5 0 0 1    1 26.20887 26.20887
#> 6 0 0 1    0 26.35095  0.00000
#> attr(,"assign")
#> [1] 1 1 1 2 3 4
#> attr(,"contrasts")
#> attr(,"contrasts")$group
#> [1] "contr.treatment"
#> 
#> attr(,"contrasts")$sex
#> [1] "contr.treatment"

# Use combn() to get all unique pairwise combinations of the group levels
combinations_group <- combn(x = unique(group), m = 2, function(x) paste(x[1], x[2], sep = "-"))
combinations_group
#> [1] "A-B" "A-C" "B-C"

# Use limma to make the contrasts matrix based on the unique groupwise combinations and the design matrix
contrasts <- limma::makeContrasts(contrasts = combinations_group, levels = make.names(colnames(design)))
contrasts
#>           Contrasts
#> Levels     A-B A-C B-C
#>   A          1   1   0
#>   B         -1   0   1
#>   C          0  -1  -1
#>   sexm       0   0   0
#>   age        0   0   0
#>   sexm.age   0   0   0