Hello, I have run into a new issue (for me) in fitting a limma model.
The model I am fitting includes a number of covariates/additional factors, such as ages and weights of individuals associated with a given "array", several random factors, such as race/ethnicity, region, and the factor defining my primary groups of interest.
E.g. model.matrix(~ 0 + groups + race + region + scale(age) + scale(weight), data = full_dataset)
I am making contrasts from pairwise interactions of levels of the groups
factor, e.g.
- high - medium
- medium - low
- high - low
mat_data <- # defined earlier
design <- # defined earlier
contrasts <- # defined earlier
When fitting a limma model, is there any difference between the following, and if so, what exactly:
fit <- lmFit(mat_data, design)
fit <- contrasts.fit(fit, contrasts)
fit <- eBayes(fit)
and
fit <- lmFit(mat_data, design)
fit <- eBayes(fit)
fit <- contrasts.fit(fit, contrasts)
Clearly there is some difference, since I see large differences in topTable
for each coefficient, particularly in the t-statistic and P-values. However, logFC
and AveExpr
stay the same, so I believe that the coefficients being estimated are the same.
The only reason I stumbled up this scenario is that I wanted to check the residuals of my fit and found that after calling contrasts.fit
I could no longer use residuals
or fitted
with the MArrayLM
object. It seems that after adjusting the model with contrasts.fit
, since the coefficients no longer match the design, you can't compute with the original coefficients. So, rather than computing the fit twice, I wanted to see if I could use the fit after the eBayes
call to extract the residuals, and then call contrasts.fit
to get the contrasts of interest.
Thank you for any help you can provide!
I'm not sure I'm clear on your suggestion. Is it necessary to call
eBayes
twice your workflow? What would be the effect in that case?It's only necessary to call
eBayes
if you need the statistics that it calculates. This will almost always be the case for the contrast fit (since you want to calltopTable
on it), but sometimes you also want it for the regular fit as well. It normally only takes a second or two to run it, so I just always run it on every fit.Ok, so you're saying that the statistics reported by
topTable
aftercontrasts.fit
, but before the secondeBayes
are based on incorrect variances? If so, what istopTable
using to calculate its statistics? I had thought that only the gene-wise variances were updated byeBayes
and that the coefficient variances were functions of those gene-wise variances (and that the contrasts of the coefficients were functions of those initial coefficient variances...).You can read the code of the functions to find out exactly what they are doing. I did that a while ago but I don't recall the exact details. I just remember that the rule I came up with was "always run
eBayes
on every fit", and I haven't had any trouble since.