For the past few weeks I've been posting about devising a way to build linear models and plot them using the ALL dataset. Specifically a linear model for the effects of age on gene expression. Up to this point I've only been able to develop a model for 1 individual gene (probe id) but because my R syntax is so nascent (perhaps fledgling is more appropriate), my attempt to construct a linear model for all 12625 rows falls apart. Thanks to an excellent video posted by Jeff Leek from Johns Hopkins (coursera) I was able to construct something that looks pretty close to what I'm seeking (but just for one gene):
edata=as.data.frame(exprs(ALL)) edata = as.matrix(edata) age <- pData(ALL)$age #thanks Martin for the shortcut! lm1 = lm(edata[1, ] ~ age)
So lets try this for the entire set (12625):
lm12625 = lm(edata[1:12625] ~ age)
But I get an error:
Error in model.frame.default(formula = edata[1:12625] ~ age, drop.unused.levels = TRUE) : variable lengths differ (found for 'age')
I'm also trying to figure out how I would alter the model to build one that specifically assesses the most significant effect on gene expression? Another model for insignificant age effect?
Many thanks Gordon! Let me try to break down what you've accomplished for my own understanding. First, for the !is.na(age) snippet - this is what's telling the code to create a variable (ALL2) which represents the entire (12625) matrix? Next, for the design step, you did the opposite of what I did in my example above for one gene, that is you've converted the age data.frame into a matrix (whereas I converted the exprs(ALL) matrix into a data.frame).
I don't see how the exprs(ALL) data was incorporated - is this inherently part of the limma package?
The topTable results provides a list of "Top Ten" probe_id's that were effected by age? I'm also wondering if there is a simple command for "bottom table"?
Thanks again, I very much appreciate you taking the time to show me this example!
EDIT: as a solution for getting the bottom ten, I printed out all the rows to a csv file and just looked them up. Not sure if this is the best way but it worked.