For the past few weeks I've been posting about devising a way to build linear models and plot them using the ALL dataset. Specifically a linear model for the effects of age on gene expression. Up to this point I've only been able to develop a model for 1 individual gene (probe id) but because my R syntax is so nascent (perhaps fledgling is more appropriate), my attempt to construct a linear model for all 12625 rows falls apart. Thanks to an excellent video posted by Jeff Leek from Johns Hopkins (coursera) I was able to construct something that looks pretty close to what I'm seeking (but just for one gene):
edata=as.data.frame(exprs(ALL)) edata = as.matrix(edata) age <- pData(ALL)$age #thanks Martin for the shortcut! lm1 = lm(edata[1, ] ~ age)
So lets try this for the entire set (12625):
lm12625 = lm(edata[1:12625] ~ age)
But I get an error:
Error in model.frame.default(formula = edata[1:12625] ~ age, drop.unused.levels = TRUE) : variable lengths differ (found for 'age')
I'm also trying to figure out how I would alter the model to build one that specifically assesses the most significant effect on gene expression? Another model for insignificant age effect?