limma and intercept
2
1
Entering edit mode
@nicolas-servant-1466
Last seen 2.5 years ago
France

Hi all,

I have an intercept issue that I do not understand with limma.

Let's say that I have a simple model with one group 'cancer sub-type' which contains 4 factors (TNBC, NonTNBC, HER2, Control).

I wrote a model which has either an intercept or not. If I add an intercept, the Control factor is used as reference.

In term of contrast, it means that testing 'subtypeTNBC' in the model with intercept is the same than testing 'subtypeTNBC - subtypeControl' in the model without intercept. Both tests compare the mean expression of TNBC vs Control groups.

However, if I want now to compare two other groups, let's say NonTNBC and TNBC ... using the same contrasts (as the Intercept is expected to correspond to the Control group), I have different results ...

Many thanks for your help

N

 

design <- model.matrix(~ 0 + subtype, data=splan)
v <- voom(y, design, plot=FALSE)
fit <- lmFit(v, design)
contrast <-makeContrasts(subtypeTNBC - subtypeNonTNBC, levels=design)
fit <- contrasts.fit(fit, contrast)
eb.wI <- eBayes(fit)
res.wI <- topTable(eb.wI, number=1e6, adjust.method="BH")


design.noI <- model.matrix(~ subtype, data=splan)
v <- voom(y, design.noI, plot=FALSE)
fit <- lmFit(v, design.noI)
contrast <-makeContrasts(subtypeTNBC - subtypeNonTNBC, levels=design.noI)
fit <- contrasts.fit(fit, contrast)
eb.noI <- eBayes(fit)
res.noI <- topTable(eb.noI, number=1e6, adjust.method="BH")

 

limma intercept • 3.4k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 12 hours ago
The city by the bay

Your code looks fine, so I would suspect any difference is due to an approximation in contrasts.fit - see the warning in ?contrasts.fit. Any difference should be small - the analysis with design.noI should be exact.

Edit: The warning mentions that both versions should be exact when using a one-way model, though this does not seem to be the case:

y <- matrix(rnorm(100000), ncol=10)
w <- 2^matrix(rnorm(100000), ncol=10)
g <- gl(5, 2)

X.in <- model.matrix(~g)
fit.in <- lmFit(y, X.in, weights=w)
con.in <- makeContrasts(g5 - g4, levels=X.in)
fit.in2 <- contrasts.fit(fit.in, con.in)
fit.in2 <- eBayes(fit.in2)
topTable(fit.in2)

X.out <- model.matrix(~0 + g)
fit.out <- lmFit(y, X.out, weights=w)
con.out <- makeContrasts(g5 - g4, levels=X.out)
fit.out2 <- contrasts.fit(fit.out, con.out)
fit.out2 <- eBayes(fit.out2)
topTable(fit.out2)
ADD COMMENT
0
Entering edit mode

constrasts.fit() is always exact when the original fit was a oneway layout with the "~ 0 + etc" group mean parametrization:

> topTable(fit.out2)
         logFC     AveExpr         t      P.Value adj.P.Val         B
7663  3.919609  0.27630782  5.115955 2.805395e-05 0.2070274 2.1636498
9605  3.197820 -0.28132385  4.964980 4.140547e-05 0.2070274 1.9756614
5614  3.007006 -0.13776775  4.687533 8.481148e-05 0.2527696 1.4164484
8374 -3.653325  0.01628550 -4.589608 1.092572e-04 0.2527696 0.9661284
4207 -2.772304  0.29816848 -4.416522 1.709205e-04 0.2527696 0.8309572
5306  2.380814  0.47113940  4.381388 1.871601e-04 0.2527696 0.8081607
3095 -3.125466 -0.34076551 -4.393183 1.815435e-04 0.2527696 0.7136030
9732 -2.921570 -0.02915717 -4.280462 2.428584e-04 0.2527696 0.4229538
171   3.193572  0.73805534  4.149116 3.406531e-04 0.2527696 0.1894304
4513 -2.802365  0.10369028 -4.122553 3.647409e-04 0.2527696 0.1373498

> g <- relevel(g,ref="4")
> X.in <- model.matrix(~g)
> fit.in <- lmFit(y, X.in, weights=w)
> fit.in <- eBayes(fit.in)
> topTable(fit.in, coef="g5")
         logFC     AveExpr         t      P.Value adj.P.Val         B
7663  3.919609  0.27630782  5.115955 2.805395e-05 0.2070274 2.1636498
9605  3.197820 -0.28132385  4.964980 4.140547e-05 0.2070274 1.9756614
5614  3.007006 -0.13776775  4.687533 8.481148e-05 0.2527696 1.4164484
8374 -3.653325  0.01628550 -4.589608 1.092572e-04 0.2527696 0.9661284
4207 -2.772304  0.29816848 -4.416522 1.709205e-04 0.2527696 0.8309572
5306  2.380814  0.47113940  4.381388 1.871601e-04 0.2527696 0.8081607
3095 -3.125466 -0.34076551 -4.393183 1.815435e-04 0.2527696 0.7136030
9732 -2.921570 -0.02915717 -4.280462 2.428584e-04 0.2527696 0.4229538
171   3.193572  0.73805534  4.149116 3.406531e-04 0.2527696 0.1894304
4513 -2.802365  0.10369028 -4.122553 3.647409e-04 0.2527696 0.1373498

The results from the direct fit.in model above agrees with those from your fit.out model and both are exact.

In general, contrasts.fit() is always exact when there are no weights or when the coefficients being compared in the contrast are statistically independent.

ADD REPLY
0
Entering edit mode
@nicolas-servant-1466
Last seen 2.5 years ago
France

Thanks for your feedbacks.

I agree that  small differences could be observed but here, it's huge ... twice mode differentially expressed genes !

 

> colnames(fit$coefficients)
[1] "subtypeTNBC - subtypeNonTNBC"
> res.noI <- topTable(eb.noI, number=1e6, adjust.method="BH", coef=1)
> head(res.noI)
                       logFC  AveExpr          t      P.Value    adj.P.Val        B
ENSG00000091831.17 -6.217176 6.003674 -12.166565 1.662744e-12 8.190013e-08 18.26739
ENSG00000003989.12 -5.562645 5.743366 -11.666386 4.356657e-12 1.072958e-07 17.46547
ENSG00000173467.4  -7.040726 2.197144 -10.470702 4.900484e-11 8.045941e-07 14.64939
ENSG00000109436.7  -3.587246 7.342151  -9.876878 1.739701e-10 2.142268e-06 14.04120
ENSG00000151892.10 -4.852984 4.816535  -9.682996 2.656709e-10 2.617177e-06 13.52225
ENSG00000118513.14 -3.623776 3.977708  -9.382388 5.171364e-10 4.245345e-06 12.78780
> head(res.wI)
                       logFC    AveExpr          t      P.Value    adj.P.Val        B
ENSG00000091831.17 -6.217176  6.0036736 -11.497290 6.072332e-12 2.990988e-07 16.94490
ENSG00000003989.12 -5.562645  5.7433655  -9.621528 3.041428e-10 7.490428e-06 13.26293
ENSG00000109436.7  -3.587246  7.3421515  -9.101191 9.746033e-10 1.215850e-05 12.34451
ENSG00000151892.10 -4.852984  4.8165346  -8.917986 1.481058e-09 1.215850e-05 11.81884
ENSG00000171428.9  -4.148381  2.9173688  -8.935683 1.422108e-09 1.215850e-05 11.61055
ENSG00000233823.1  -5.547515 -0.9911218  -9.030907 1.143732e-09 1.215850e-05 11.32991
> length(which(res.noI$adj.P.Val<0.05))
[1] 3507
> length(which(res.wI$adj.P.Val<0.05))
[1] 1853

 

ADD COMMENT
0
Entering edit mode

And when I testing any difference involving the control group, it's fine. I have exactly the same pvalues ...

That's why I do not understand the results ...

ADD REPLY
0
Entering edit mode

Aaron, does the approximation issue you mentionned correspond to the differences in the test statistics 't' ?

ADD REPLY
0
Entering edit mode

Indirectly: the approximation pertains to the unscaled standard errors of the coefficients (i.e., the log-fold change). You can see how your log-fold change estimates are the same between approaches, so you are doing the same comparison.

ADD REPLY
0
Entering edit mode

To be fair, for each individual gene the p-values from the either approach are quite close.

Anyway, to be sure that the p-values are exact, just use the 0+ parametrization ...

ADD REPLY

Login before adding your answer.

Traffic: 954 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6