Question: limma and intercept
1
gravatar for Nicolas Servant
10 months ago by
France
Nicolas Servant260 wrote:

Hi all,

I have an intercept issue that I do not understand with limma.

Let's say that I have a simple model with one group 'cancer sub-type' which contains 4 factors (TNBC, NonTNBC, HER2, Control).

I wrote a model which has either an intercept or not. If I add an intercept, the Control factor is used as reference.

In term of contrast, it means that testing 'subtypeTNBC' in the model with intercept is the same than testing 'subtypeTNBC - subtypeControl' in the model without intercept. Both tests compare the mean expression of TNBC vs Control groups.

However, if I want now to compare two other groups, let's say NonTNBC and TNBC ... using the same contrasts (as the Intercept is expected to correspond to the Control group), I have different results ...

Many thanks for your help

N

 

design <- model.matrix(~ 0 + subtype, data=splan)
v <- voom(y, design, plot=FALSE)
fit <- lmFit(v, design)
contrast <-makeContrasts(subtypeTNBC - subtypeNonTNBC, levels=design)
fit <- contrasts.fit(fit, contrast)
eb.wI <- eBayes(fit)
res.wI <- topTable(eb.wI, number=1e6, adjust.method="BH")


design.noI <- model.matrix(~ subtype, data=splan)
v <- voom(y, design.noI, plot=FALSE)
fit <- lmFit(v, design.noI)
contrast <-makeContrasts(subtypeTNBC - subtypeNonTNBC, levels=design.noI)
fit <- contrasts.fit(fit, contrast)
eb.noI <- eBayes(fit)
res.noI <- topTable(eb.noI, number=1e6, adjust.method="BH")

 

limma intercept • 345 views
ADD COMMENTlink modified 10 months ago • written 10 months ago by Nicolas Servant260
Answer: limma and intercept
0
gravatar for Aaron Lun
10 months ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

Your code looks fine, so I would suspect any difference is due to an approximation in contrasts.fit - see the warning in ?contrasts.fit. Any difference should be small - the analysis with design.noI should be exact.

Edit: The warning mentions that both versions should be exact when using a one-way model, though this does not seem to be the case:

y <- matrix(rnorm(100000), ncol=10)
w <- 2^matrix(rnorm(100000), ncol=10)
g <- gl(5, 2)

X.in <- model.matrix(~g)
fit.in <- lmFit(y, X.in, weights=w)
con.in <- makeContrasts(g5 - g4, levels=X.in)
fit.in2 <- contrasts.fit(fit.in, con.in)
fit.in2 <- eBayes(fit.in2)
topTable(fit.in2)

X.out <- model.matrix(~0 + g)
fit.out <- lmFit(y, X.out, weights=w)
con.out <- makeContrasts(g5 - g4, levels=X.out)
fit.out2 <- contrasts.fit(fit.out, con.out)
fit.out2 <- eBayes(fit.out2)
topTable(fit.out2)
ADD COMMENTlink modified 10 months ago • written 10 months ago by Aaron Lun24k

constrasts.fit() is always exact when the original fit was a oneway layout with the "~ 0 + etc" group mean parametrization:

> topTable(fit.out2)
         logFC     AveExpr         t      P.Value adj.P.Val         B
7663  3.919609  0.27630782  5.115955 2.805395e-05 0.2070274 2.1636498
9605  3.197820 -0.28132385  4.964980 4.140547e-05 0.2070274 1.9756614
5614  3.007006 -0.13776775  4.687533 8.481148e-05 0.2527696 1.4164484
8374 -3.653325  0.01628550 -4.589608 1.092572e-04 0.2527696 0.9661284
4207 -2.772304  0.29816848 -4.416522 1.709205e-04 0.2527696 0.8309572
5306  2.380814  0.47113940  4.381388 1.871601e-04 0.2527696 0.8081607
3095 -3.125466 -0.34076551 -4.393183 1.815435e-04 0.2527696 0.7136030
9732 -2.921570 -0.02915717 -4.280462 2.428584e-04 0.2527696 0.4229538
171   3.193572  0.73805534  4.149116 3.406531e-04 0.2527696 0.1894304
4513 -2.802365  0.10369028 -4.122553 3.647409e-04 0.2527696 0.1373498

> g <- relevel(g,ref="4")
> X.in <- model.matrix(~g)
> fit.in <- lmFit(y, X.in, weights=w)
> fit.in <- eBayes(fit.in)
> topTable(fit.in, coef="g5")
         logFC     AveExpr         t      P.Value adj.P.Val         B
7663  3.919609  0.27630782  5.115955 2.805395e-05 0.2070274 2.1636498
9605  3.197820 -0.28132385  4.964980 4.140547e-05 0.2070274 1.9756614
5614  3.007006 -0.13776775  4.687533 8.481148e-05 0.2527696 1.4164484
8374 -3.653325  0.01628550 -4.589608 1.092572e-04 0.2527696 0.9661284
4207 -2.772304  0.29816848 -4.416522 1.709205e-04 0.2527696 0.8309572
5306  2.380814  0.47113940  4.381388 1.871601e-04 0.2527696 0.8081607
3095 -3.125466 -0.34076551 -4.393183 1.815435e-04 0.2527696 0.7136030
9732 -2.921570 -0.02915717 -4.280462 2.428584e-04 0.2527696 0.4229538
171   3.193572  0.73805534  4.149116 3.406531e-04 0.2527696 0.1894304
4513 -2.802365  0.10369028 -4.122553 3.647409e-04 0.2527696 0.1373498

The results from the direct fit.in model above agrees with those from your fit.out model and both are exact.

In general, contrasts.fit() is always exact when there are no weights or when the coefficients being compared in the contrast are statistically independent.

ADD REPLYlink modified 10 months ago • written 10 months ago by Gordon Smyth38k
Answer: limma and intercept
0
gravatar for Nicolas Servant
10 months ago by
France
Nicolas Servant260 wrote:

Thanks for your feedbacks.

I agree that  small differences could be observed but here, it's huge ... twice mode differentially expressed genes !

 

> colnames(fit$coefficients)
[1] "subtypeTNBC - subtypeNonTNBC"
> res.noI <- topTable(eb.noI, number=1e6, adjust.method="BH", coef=1)
> head(res.noI)
                       logFC  AveExpr          t      P.Value    adj.P.Val        B
ENSG00000091831.17 -6.217176 6.003674 -12.166565 1.662744e-12 8.190013e-08 18.26739
ENSG00000003989.12 -5.562645 5.743366 -11.666386 4.356657e-12 1.072958e-07 17.46547
ENSG00000173467.4  -7.040726 2.197144 -10.470702 4.900484e-11 8.045941e-07 14.64939
ENSG00000109436.7  -3.587246 7.342151  -9.876878 1.739701e-10 2.142268e-06 14.04120
ENSG00000151892.10 -4.852984 4.816535  -9.682996 2.656709e-10 2.617177e-06 13.52225
ENSG00000118513.14 -3.623776 3.977708  -9.382388 5.171364e-10 4.245345e-06 12.78780
> head(res.wI)
                       logFC    AveExpr          t      P.Value    adj.P.Val        B
ENSG00000091831.17 -6.217176  6.0036736 -11.497290 6.072332e-12 2.990988e-07 16.94490
ENSG00000003989.12 -5.562645  5.7433655  -9.621528 3.041428e-10 7.490428e-06 13.26293
ENSG00000109436.7  -3.587246  7.3421515  -9.101191 9.746033e-10 1.215850e-05 12.34451
ENSG00000151892.10 -4.852984  4.8165346  -8.917986 1.481058e-09 1.215850e-05 11.81884
ENSG00000171428.9  -4.148381  2.9173688  -8.935683 1.422108e-09 1.215850e-05 11.61055
ENSG00000233823.1  -5.547515 -0.9911218  -9.030907 1.143732e-09 1.215850e-05 11.32991
> length(which(res.noI$adj.P.Val<0.05))
[1] 3507
> length(which(res.wI$adj.P.Val<0.05))
[1] 1853

 

ADD COMMENTlink written 10 months ago by Nicolas Servant260

And when I testing any difference involving the control group, it's fine. I have exactly the same pvalues ...

That's why I do not understand the results ...

ADD REPLYlink modified 10 months ago • written 10 months ago by Nicolas Servant260

Aaron, does the approximation issue you mentionned correspond to the differences in the test statistics 't' ?

ADD REPLYlink written 10 months ago by Nicolas Servant260

Indirectly: the approximation pertains to the unscaled standard errors of the coefficients (i.e., the log-fold change). You can see how your log-fold change estimates are the same between approaches, so you are doing the same comparison.

ADD REPLYlink modified 10 months ago • written 10 months ago by Aaron Lun24k

To be fair, for each individual gene the p-values from the either approach are quite close.

Anyway, to be sure that the p-values are exact, just use the 0+ parametrization ...

ADD REPLYlink modified 10 months ago • written 10 months ago by Gordon Smyth38k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 170 users visited in the last hour