Hello, I have run into a new issue (for me) in fitting a limma model.
The model I am fitting includes a number of covariates/additional factors, such as ages and weights of individuals associated with a given "array", several random factors, such as race/ethnicity, region, and the factor defining my primary groups of interest.
E.g. model.matrix(~ 0 + groups + race + region + scale(age) + scale(weight), data = full_dataset)
I am making contrasts from pairwise interactions of levels of the groups factor, e.g.
- high - medium
- medium - low
- high - low
mat_data <- # defined earlier
design <- # defined earlier
contrasts <- # defined earlier
When fitting a limma model, is there any difference between the following, and if so, what exactly:
fit <- lmFit(mat_data, design)
fit <- contrasts.fit(fit, contrasts)
fit <- eBayes(fit)
and
fit <- lmFit(mat_data, design)
fit <- eBayes(fit)
fit <- contrasts.fit(fit, contrasts)
Clearly there is some difference, since I see large differences in topTable for each coefficient, particularly in the t-statistic and P-values. However, logFC and AveExpr stay the same, so I believe that the coefficients being estimated are the same.
The only reason I stumbled up this scenario is that I wanted to check the residuals of my fit and found that after calling contrasts.fit I could no longer use residuals or fitted with the MArrayLM object. It seems that after adjusting the model with contrasts.fit, since the coefficients no longer match the design, you can't compute with the original coefficients. So, rather than computing the fit twice, I wanted to see if I could use the fit after the eBayes call to extract the residuals, and then call contrasts.fit to get the contrasts of interest.
Thank you for any help you can provide!

I'm not sure I'm clear on your suggestion. Is it necessary to call
eBayestwice your workflow? What would be the effect in that case?It's only necessary to call
eBayesif you need the statistics that it calculates. This will almost always be the case for the contrast fit (since you want to calltopTableon it), but sometimes you also want it for the regular fit as well. It normally only takes a second or two to run it, so I just always run it on every fit.Ok, so you're saying that the statistics reported by
topTableaftercontrasts.fit, but before the secondeBayesare based on incorrect variances? If so, what istopTableusing to calculate its statistics? I had thought that only the gene-wise variances were updated byeBayesand that the coefficient variances were functions of those gene-wise variances (and that the contrasts of the coefficients were functions of those initial coefficient variances...).You can read the code of the functions to find out exactly what they are doing. I did that a while ago but I don't recall the exact details. I just remember that the rule I came up with was "always run
eBayeson every fit", and I haven't had any trouble since.