I'm running a series of linear regressions on a subset of candidate genes using lmFit(.)
from the limma package. Two, hopefully simple questions:
1. I am trying to create a matrix of residuals controlling for another factor (sex) so I can look at the effects of a series of variables in a loop. Does it still make sense to use the fit <-eBayes(fit)
, as I've seen recommended?
I ask because I don't entirely understand how eBayes is creating the variance, and whether the 'residuals' in this instance really make sense. I've noticed the unadjusted t- and p-values are slightly different from if I just run a simple linear regression for any given probe.
2. Assuming I can proceed with the above residuals and have run my analyses using the residuals matrix, I print the top results. I have some options for coef(.)
. I've read the manual but am still a bit confused. Is the first coefficient (1) the intercept? If I am interested variable x, should I simply ask for the second coefficient (2). If I put coef(1, 2)
, I get quite different p-values than with coef(2)
alone, with important effects on the significance of the adjusted p-values.
BONUS QUESTION:
3. Finally, if I am interested in the relationship between methylation and a variable with several levels (i.e. low, medium, high, very high), how ask the basic question "are the levels different", without asking "is low different from medium/high/very high", a la anova?
Hope these are not overly trivial questions and are helpful to someone else.
Many thanks!
Calen
The
coef(2)
call doesn't refer to anything that's I know of. Did you meancoef=2
? Andcoef=c(1,2)
? These will obviously give you different results as they are describing different contrasts.Right,
coef(2)
was a syntax error forcoef = 2
, my apologies.Where I'm struggling a bit is that you say
coef = 2
is not equivalent tocoef = c(1, 2)
. Why? Isn't a two level factor always compared to the value of the intercept in a way? So insummary(lm(model)
the intercept will be for one of the two factor levels (e.g. sex = male), and the variable sex in the model will saysexF
, and imply that there is a comparison to the intercept, which here is coef = 1 (socoef = c(1, 2)
)?Sorry if I'm way off here, or if I'm making weird comparisons (
lm(.)
tolmFit(.)
), just want to make sure I understand what is going on for the contrasts.If you want to know which coefficient is which, you need to look at the column names of your design matrix. Unless you've done something weird or specifically excluded the intercept, the first column should be your intercept. The coef argument indicates which coefficients you are setting to zero in your null hypothesis. Assuming that your first coefficient is the intercept, you definitely don't want to test the hypothesis that your intercept is equal to zero, because this hypothesis has no biological interpretation and will be rejected for nearly 100% of probes. So you shouldn't be putting 1 in the coef argument.
Unlike lm, limma does not hide the design matrix from you, but rather requires you to manipulate it directly. This is because the hypotheses being tested with limma often involve individual coefficients from multi-level factors that cannot be expressed simply by dropping terms from the design formula, which is how one normally uses lm. So specifying your null hypothesis at the level of individual coefficients rather than design terms tends to be more convenient.
Ok awesome that makes sense, thanks @Ryan C. Thompson!
Hi,
I was wondering if you got the answer to your question no 3. I'm struggling with limma package to perform the analysis similar in design to classical ANOVA. I'm looking for the way how to obtain the p-value for the null hypothesis that all mean values across all groups are equal. It will allow me to filter out features before performing pairwise comparisons. You help will be highly appreciated. Thanks, Aga