Question

Order of contrasts.fit call (before or after eBayes)

0

Entering edit mode

andrew.birnberg ▴ 20

@andrewbirnberg-16677

Last seen 5.7 years ago

Hello, I have run into a new issue (for me) in fitting a limma model.

The model I am fitting includes a number of covariates/additional factors, such as ages and weights of individuals associated with a given "array", several random factors, such as race/ethnicity, region, and the factor defining my primary groups of interest.

E.g. model.matrix(~ 0 + groups + race + region + scale(age) + scale(weight), data = full_dataset)

I am making contrasts from pairwise interactions of levels of the groups factor, e.g. - high - medium - medium - low - high - low

mat_data <- # defined earlier
design <- # defined earlier
contrasts <- # defined earlier

When fitting a limma model, is there any difference between the following, and if so, what exactly:

fit <- lmFit(mat_data, design)
fit <- contrasts.fit(fit, contrasts)
fit <- eBayes(fit)

and

fit <- lmFit(mat_data, design)
fit <- eBayes(fit)
fit <- contrasts.fit(fit, contrasts)

Clearly there is some difference, since I see large differences in topTable for each coefficient, particularly in the t-statistic and P-values. However, logFC and AveExpr stay the same, so I believe that the coefficients being estimated are the same.

The only reason I stumbled up this scenario is that I wanted to check the residuals of my fit and found that after calling contrasts.fit I could no longer use residuals or fitted with the MArrayLM object. It seems that after adjusting the model with contrasts.fit, since the coefficients no longer match the design, you can't compute with the original coefficients. So, rather than computing the fit twice, I wanted to see if I could use the fit after the eBayes call to extract the residuals, and then call contrasts.fit to get the contrasts of interest.

Thank you for any help you can provide!

limma • 3.7k views

ADD COMMENT • link updated 5.7 years ago by Gordon Smyth 51k • written 5.7 years ago by andrew.birnberg ▴ 20

2

Entering edit mode

Gordon Smyth 51k

@gordon-smyth

Last seen 7 hours ago

WEHI, Melbourne, Australia

I think you already know which order is correct. The order with contrasts.fit before eBayes is standard and correct. You've observed that the second order gives different results so you know it can't be right.

It's easy to see what's going on:

lmFit computes coefficients, residual variances and standard errors.
contrasts.fit converts the coefficients and standard errors to reflect the contrasts rather than the original design matrix, but does not compute t-statistics or p-values.
eBayes computes t-statistics and p-values from the coefficients and standard errors.

So you can see that, if you run contrasts.fit after eBayes without running eBayes again, then the object contains coefficients and standard errors matching the contrasts but t-statistics and p-values matching the original coefficients. Obviously not sensible.

If you run the code as Ryan suggested:

fit <- lmFit(mat_data, design)
fit2 <- contrasts.fit(fit, contrasts)
fit2 <- eBayes(fit2)

then you preserve both the original coefficients and the contrasts. It's no coincidence that the example code is written exactly like this on the help page for contrasts.fit and every time the contrasts.fit function is used in the limma User's Guide.

It is true that you can't compute fitted values or residuals from fit2, but I find that I hardly ever want to do that. I could have programmed constrasts.fit so that it would store both the original coefficients and the contrasts but I didn't want to make the output object any bigger than it needed to be.

ADD COMMENT • link 5.7 years ago Gordon Smyth 51k

score 2 · Accepted Answer · 2019-02-11

2

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 4 weeks ago

Icahn School of Medicine at Mount Sinai…

I recommend saving the result of contrasts.fit as a separate object, so that you still have access to the original lmFit result as well. They both have their uses, as you have noted. Run eBayes on both of them. My typical workflow looks something like this:

fit <- lmFit(mat_data, design)
fit <- eBayes(fit)
cfit <- contrasts.fit(fit, contrasts)
cfit <- eBayes(cfit)
topTable(cfit)

ADD COMMENT • link 5.7 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

I'm not sure I'm clear on your suggestion. Is it necessary to call eBayes twice your workflow? What would be the effect in that case?

ADD REPLY • link 5.7 years ago andrew.birnberg ▴ 20

0

Entering edit mode

It's only necessary to call eBayes if you need the statistics that it calculates. This will almost always be the case for the contrast fit (since you want to call topTable on it), but sometimes you also want it for the regular fit as well. It normally only takes a second or two to run it, so I just always run it on every fit.

ADD REPLY • link 5.7 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Ok, so you're saying that the statistics reported by topTable after contrasts.fit, but before the second eBayes are based on incorrect variances? If so, what is topTable using to calculate its statistics? I had thought that only the gene-wise variances were updated by eBayes and that the coefficient variances were functions of those gene-wise variances (and that the contrasts of the coefficients were functions of those initial coefficient variances...).

ADD REPLY • link 5.7 years ago andrew.birnberg ▴ 20

1

Entering edit mode

You can read the code of the functions to find out exactly what they are doing. I did that a while ago but I don't recall the exact details. I just remember that the rule I came up with was "always run eBayes on every fit", and I haven't had any trouble since.

ADD REPLY • link 5.7 years ago Ryan C. Thompson ★ 7.9k