Question

limma and intercept

1

Entering edit mode

Nicolas Servant ▴ 260

@nicolas-servant-1466

Last seen 2.9 years ago

France

Hi all,

I have an intercept issue that I do not understand with limma.

Let's say that I have a simple model with one group 'cancer sub-type' which contains 4 factors (TNBC, NonTNBC, HER2, Control).

I wrote a model which has either an intercept or not. If I add an intercept, the Control factor is used as reference.

In term of contrast, it means that testing 'subtypeTNBC' in the model with intercept is the same than testing 'subtypeTNBC - subtypeControl' in the model without intercept. Both tests compare the mean expression of TNBC vs Control groups.

However, if I want now to compare two other groups, let's say NonTNBC and TNBC ... using the same contrasts (as the Intercept is expected to correspond to the Control group), I have different results ...

Many thanks for your help

N

design <- model.matrix(~ 0 + subtype, data=splan)
v <- voom(y, design, plot=FALSE)
fit <- lmFit(v, design)
contrast <-makeContrasts(subtypeTNBC - subtypeNonTNBC, levels=design)
fit <- contrasts.fit(fit, contrast)
eb.wI <- eBayes(fit)
res.wI <- topTable(eb.wI, number=1e6, adjust.method="BH")

design.noI <- model.matrix(~ subtype, data=splan)
v <- voom(y, design.noI, plot=FALSE)
fit <- lmFit(v, design.noI)
contrast <-makeContrasts(subtypeTNBC - subtypeNonTNBC, levels=design.noI)
fit <- contrasts.fit(fit, contrast)
eb.noI <- eBayes(fit)
res.noI <- topTable(eb.noI, number=1e6, adjust.method="BH")

limma intercept • 3.7k views

ADD COMMENT • link 6.5 years ago Nicolas Servant ▴ 260

score 0 · Answer 1 · 2018-10-31

0

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 9 hours ago

The city by the bay

Your code looks fine, so I would suspect any difference is due to an approximation in contrasts.fit - see the warning in ?contrasts.fit. Any difference should be small - the analysis with design.noI should be exact.

Edit: The warning mentions that both versions should be exact when using a one-way model, though this does not seem to be the case:

y <- matrix(rnorm(100000), ncol=10)
w <- 2^matrix(rnorm(100000), ncol=10)
g <- gl(5, 2)

X.in <- model.matrix(~g)
fit.in <- lmFit(y, X.in, weights=w)
con.in <- makeContrasts(g5 - g4, levels=X.in)
fit.in2 <- contrasts.fit(fit.in, con.in)
fit.in2 <- eBayes(fit.in2)
topTable(fit.in2)

X.out <- model.matrix(~0 + g)
fit.out <- lmFit(y, X.out, weights=w)
con.out <- makeContrasts(g5 - g4, levels=X.out)
fit.out2 <- contrasts.fit(fit.out, con.out)
fit.out2 <- eBayes(fit.out2)
topTable(fit.out2)

ADD COMMENT • link 6.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

constrasts.fit() is always exact when the original fit was a oneway layout with the "~ 0 + etc" group mean parametrization:

> topTable(fit.out2)
         logFC     AveExpr         t      P.Value adj.P.Val         B
7663  3.919609  0.27630782  5.115955 2.805395e-05 0.2070274 2.1636498
9605  3.197820 -0.28132385  4.964980 4.140547e-05 0.2070274 1.9756614
5614  3.007006 -0.13776775  4.687533 8.481148e-05 0.2527696 1.4164484
8374 -3.653325  0.01628550 -4.589608 1.092572e-04 0.2527696 0.9661284
4207 -2.772304  0.29816848 -4.416522 1.709205e-04 0.2527696 0.8309572
5306  2.380814  0.47113940  4.381388 1.871601e-04 0.2527696 0.8081607
3095 -3.125466 -0.34076551 -4.393183 1.815435e-04 0.2527696 0.7136030
9732 -2.921570 -0.02915717 -4.280462 2.428584e-04 0.2527696 0.4229538
171   3.193572  0.73805534  4.149116 3.406531e-04 0.2527696 0.1894304
4513 -2.802365  0.10369028 -4.122553 3.647409e-04 0.2527696 0.1373498

> g <- relevel(g,ref="4")
> X.in <- model.matrix(~g)
> fit.in <- lmFit(y, X.in, weights=w)
> fit.in <- eBayes(fit.in)
> topTable(fit.in, coef="g5")
         logFC     AveExpr         t      P.Value adj.P.Val         B
7663  3.919609  0.27630782  5.115955 2.805395e-05 0.2070274 2.1636498
9605  3.197820 -0.28132385  4.964980 4.140547e-05 0.2070274 1.9756614
5614  3.007006 -0.13776775  4.687533 8.481148e-05 0.2527696 1.4164484
8374 -3.653325  0.01628550 -4.589608 1.092572e-04 0.2527696 0.9661284
4207 -2.772304  0.29816848 -4.416522 1.709205e-04 0.2527696 0.8309572
5306  2.380814  0.47113940  4.381388 1.871601e-04 0.2527696 0.8081607
3095 -3.125466 -0.34076551 -4.393183 1.815435e-04 0.2527696 0.7136030
9732 -2.921570 -0.02915717 -4.280462 2.428584e-04 0.2527696 0.4229538
171   3.193572  0.73805534  4.149116 3.406531e-04 0.2527696 0.1894304
4513 -2.802365  0.10369028 -4.122553 3.647409e-04 0.2527696 0.1373498

The results from the direct fit.in model above agrees with those from your fit.out model and both are exact.

In general, contrasts.fit() is always exact when there are no weights or when the coefficients being compared in the contrast are statistically independent.

ADD REPLY • link 6.5 years ago Gordon Smyth 52k

score 0 · Answer 2 · 2018-10-31

0

Entering edit mode

Nicolas Servant ▴ 260

@nicolas-servant-1466

Last seen 2.9 years ago

France

Thanks for your feedbacks.

I agree that small differences could be observed but here, it's huge ... twice mode differentially expressed genes !

> colnames(fit$coefficients)
[1] "subtypeTNBC - subtypeNonTNBC"
> res.noI <- topTable(eb.noI, number=1e6, adjust.method="BH", coef=1)
> head(res.noI)
logFC AveExpr t P.Value adj.P.Val B
ENSG00000091831.17 -6.217176 6.003674 -12.166565 1.662744e-12 8.190013e-08 18.26739
ENSG00000003989.12 -5.562645 5.743366 -11.666386 4.356657e-12 1.072958e-07 17.46547
ENSG00000173467.4 -7.040726 2.197144 -10.470702 4.900484e-11 8.045941e-07 14.64939
ENSG00000109436.7 -3.587246 7.342151 -9.876878 1.739701e-10 2.142268e-06 14.04120
ENSG00000151892.10 -4.852984 4.816535 -9.682996 2.656709e-10 2.617177e-06 13.52225
ENSG00000118513.14 -3.623776 3.977708 -9.382388 5.171364e-10 4.245345e-06 12.78780
> head(res.wI)
logFC AveExpr t P.Value adj.P.Val B
ENSG00000091831.17 -6.217176 6.0036736 -11.497290 6.072332e-12 2.990988e-07 16.94490
ENSG00000003989.12 -5.562645 5.7433655 -9.621528 3.041428e-10 7.490428e-06 13.26293
ENSG00000109436.7 -3.587246 7.3421515 -9.101191 9.746033e-10 1.215850e-05 12.34451
ENSG00000151892.10 -4.852984 4.8165346 -8.917986 1.481058e-09 1.215850e-05 11.81884
ENSG00000171428.9 -4.148381 2.9173688 -8.935683 1.422108e-09 1.215850e-05 11.61055
ENSG00000233823.1 -5.547515 -0.9911218 -9.030907 1.143732e-09 1.215850e-05 11.32991
> length(which(res.noI$adj.P.Val<0.05))
[1] 3507
> length(which(res.wI$adj.P.Val<0.05))
[1] 1853

ADD COMMENT • link 6.5 years ago Nicolas Servant ▴ 260

0

Entering edit mode

And when I testing any difference involving the control group, it's fine. I have exactly the same pvalues ...

That's why I do not understand the results ...

ADD REPLY • link 6.5 years ago Nicolas Servant ▴ 260

0

Entering edit mode

Aaron, does the approximation issue you mentionned correspond to the differences in the test statistics 't' ?

ADD REPLY • link 6.5 years ago Nicolas Servant ▴ 260

0

Entering edit mode

Indirectly: the approximation pertains to the unscaled standard errors of the coefficients (i.e., the log-fold change). You can see how your log-fold change estimates are the same between approaches, so you are doing the same comparison.

ADD REPLY • link 6.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

To be fair, for each individual gene the p-values from the either approach are quite close.

Anyway, to be sure that the p-values are exact, just use the 0+ parametrization ...

ADD REPLY • link 6.5 years ago Gordon Smyth 52k