I'm running a series of linear regressions on a subset of candidate genes using
lmFit(.) from the limma package. Two, hopefully simple questions:
1. I am trying to create a matrix of residuals controlling for another factor (sex) so I can look at the effects of a series of variables in a loop. Does it still make sense to use the
fit <-eBayes(fit), as I've seen recommended?
I ask because I don't entirely understand how eBayes is creating the variance, and whether the 'residuals' in this instance really make sense. I've noticed the unadjusted t- and p-values are slightly different from if I just run a simple linear regression for any given probe.
2. Assuming I can proceed with the above residuals and have run my analyses using the residuals matrix, I print the top results. I have some options for
coef(.). I've read the manual but am still a bit confused. Is the first coefficient (1) the intercept? If I am interested variable x, should I simply ask for the second coefficient (2). If I put
coef(1, 2), I get quite different p-values than with
coef(2)alone, with important effects on the significance of the adjusted p-values.
3. Finally, if I am interested in the relationship between methylation and a variable with several levels (i.e. low, medium, high, very high), how ask the basic question "are the levels different", without asking "is low different from medium/high/very high", a la anova?
Hope these are not overly trivial questions and are helpful to someone else.