LIMMA: ~ continuous predictor by group. How to get within-group results?
1
0
Entering edit mode
Sindre ▴ 100
@sindre-6193
Last seen 13 months ago

Cross-posted from Biostars: https://www.biostars.org/p/457094/

set.seed(1)
IDs=as.factor(c(1:20))
pheno=cbind.data.frame(IDs=IDs,
gr=as.factor(c(rep("A",10),rep("B",10))),
oc=rnorm(20)
)
dat=rbind.data.frame(varA=rnorm(20),
varB=rnorm(20),
varC=rnorm(20),
varD=rnorm(20)
)
colnames(dat)=IDs
rownames(pheno)=IDs
form=as.formula(~oc*gr)
design=model.matrix(form,data=pheno)
colnames(design)=make.names(colnames(design))
colnames(design)
[1] "X.Intercept." "oc"           "grB"          "oc.grB"

library(limma)
fit=lmFit(dat,design)
fit2=eBayes(fit)
topTable(fit2,coef=1) # Intercept = Group A when OC == 0
topTable(fit2,coef=2) # oc = slope of OC for Group A
topTable(fit2,coef=3) # grB = difference in intercept between Group B and Group A
topTable(fit2,coef=4) # Interaction results: oc.grB = difference in slope between Group B and Group A


The question is then: How do I get within group A and within group B results to further examine the nature of the interaction? Eg. 'Post-Hoc' testing: ~oc+gr twice, splitting the groups before constructing the model. I know I can re-run the analyses stratifying by group, but it seems unnecessary.

fit3 <- contrasts.fit(fit, c(1, 0, 0, -1)) # Not sure how to specify this correctly
fit3 <- eBayes(fit3)
topTable(fit3)

limma interaction • 368 views
2
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

In that formulation you can interpret the coefficients as

Intercept = Group A when OC == 0 oc = slope of OC for Group A grB = difference in intercept between Group B and Group A oc.grB = difference in slope between Group B and Group A

So to get the intercept for Group B and its slope you simply add the Intercept and grB, and to get the slope you add oc and oc.grB. Not sure why you would want the intercept, so let's concentrate on the slope.

## fake example

> library(limma)
> set.seed(0xabeef)
> z <- matrix(rnorm(1200), 100)
## try to force a significant interaction...
> z[1:10,7:9] <- z[1:10,7:9] - 2
> z[1:10,10:12] <- z[1:10,10:12] + 2
> grp <- gl(2,6)
> oc <- rnorm(12)
> design <- model.matrix(~oc*grp)

> fit <- lmFit(z, design)
> fitnocont <- eBayes(fit)
## top interacting 'gene' is #10
> topTable(fitnocont,4)
logFC     AveExpr         t     P.Value adj.P.Val         B
10  2.810583 -0.04859717  3.475649 0.001628594 0.1628594 -1.128217
7   3.439371 -0.09154699  3.154620 0.003731236 0.1865618 -1.842094
8   2.341007  0.02467877  2.991910 0.005618994 0.1872998 -2.192559
3   2.772142 -0.07539511  2.751199 0.010136632 0.2120847 -2.694125
2   2.502926 -0.06289789  2.732451 0.010604237 0.2120847 -2.732257
98  1.846969 -0.05407544  2.471004 0.019606723 0.3267787 -3.248230
24  1.564215  0.39737675  2.309226 0.028269238 0.3545001 -3.551400
47 -1.671360  0.56112280 -2.307785 0.028360006 0.3545001 -3.554040
69  1.604762  0.07991945  2.221184 0.034324866 0.3658172 -3.710700
89 -1.516999 -0.15290051 -2.153428 0.039753979 0.3658172 -3.830410
## now get the slope for grpB
> contrast <- matrix(c(0,1,0,1),4)
> fitcont <- contrasts.fit(fit, contrast)
> fitcont <- eBayes(fitcont)
## slope for gene 10, for Group B is 2.511
> topTable(fitcont)
logFC     AveExpr        t      P.Value  adj.P.Val          B
8  2.501193  0.02467877 3.811288 0.0006680289 0.04461036 -0.3976625
10 2.511630 -0.04859717 3.703171 0.0008922073 0.04461036 -0.6622559
7  3.065967 -0.09154699 3.352852 0.0022431206 0.06660769 -1.5014681
1  2.241543  0.31163451 3.286267 0.0026643078 0.06660769 -1.6573120
6  2.229692 -0.25763843 2.410511 0.0225121215 0.35721935 -3.5549365
2  1.810080 -0.06289789 2.356030 0.0254603230 0.35721935 -3.6614464
5  1.809311 -0.44420628 2.312969 0.0280346610 0.35721935 -3.7444875
3  1.947435 -0.07539511 2.304349 0.0285775482 0.35721935 -3.7609879
76 1.220782 -0.18118689 2.131452 0.0416729645 0.45089658 -4.0829253
37 1.150990  0.01963060 2.049685 0.0495564211 0.45089658 -4.2289395
## now just do this by hand
## note that you can't actually fit ~oc + grpB
> summary(lm(z[10,]~oc , subset = grp == 2))

Call:
lm(formula = z[10, ] ~ oc, subset = grp == 2)

Residuals:
7       8       9      10      11      12
-1.7879 -0.3138 -0.7352  2.0230 -0.6280  1.4419

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  -0.7218     0.6584  -1.096   0.3345
oc            2.5116     1.0077   2.492   0.0673 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.613 on 4 degrees of freedom
Multiple R-squared:  0.6083,    Adjusted R-squared:  0.5104
F-statistic: 6.212 on 1 and 4 DF,  p-value: 0.06731

## same coefficient

0
Entering edit mode

Perfect! I know it is a different issue, but why does the within-group standard error and p-values differ when computed from the interaction model, as opposed to splitting the groups and run two separate model (as you showed in the answer)? I see that the estimate is the same. I am not afraid of the math, so just bring it on! :)

1
Entering edit mode

The standard error is the standard deviation of the residuals divided by the square root of the number of observations. If you cut your number of samples in half, you have fewer samples from which to estimate the standard deviation (which will change what you get), and you divide by a smaller number, so the estimated standard error will, all things equal, be larger. Since that's the denominator of your t-statistic, a larger denominator will give a smaller t-statistic when the numerator doesn't change.

In addition, the t-statistic will be based on half the degrees of freedom as well, so even if the t-statistic didn't change, the p-value will get larger. Plus, the degrees of freedom for the limma model fit will be larger than the lm fit because of the eBayes step. See the limma User's Guide and the relevant papers referenced therein for more information.

0
Entering edit mode

Absolutely wonderful! May I kindly ask also how to extract the results across both groups / the slope for oc across both groups?

0
Entering edit mode

0
Entering edit mode

My mistanke, my english is probably insufficient. The answer you wrote is correct. What I meant was a new question: how to get the slope for oc regarding all participants as one group, kinda like fitting ~oc+grp on the full data set.

0
Entering edit mode

By fitting ~oc+grp on the genes for which the interaction isn't significant.