Building linear model (age effect on gene expression)
Entering edit mode
Last seen 2.3 years ago

For the past few weeks I've been posting about devising a way to build linear models and plot them using the ALL dataset.  Specifically a linear model for the effects of age on gene expression.  Up to this point I've only been able to develop a model for 1 individual gene (probe id) but because my R syntax is so nascent (perhaps fledgling is more appropriate), my attempt to construct a linear model for all 12625 rows falls apart.  Thanks to an excellent video posted by Jeff Leek from Johns Hopkins (coursera) I was able to construct something that looks pretty close to what I'm seeking (but just for one gene):

edata = as.matrix(edata)

age <- pData(ALL)$age #thanks Martin for the shortcut!

lm1 = lm(edata[1, ] ~ age)

​So lets try this for the entire set (12625):

lm12625 = lm(edata[1:12625] ~ age)

But I get an error:

Error in model.frame.default(formula = edata[1:12625] ~ age, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'age')

I'm also trying to figure out how I would alter the model to build one that specifically assesses the most significant effect on gene expression?  Another model for insignificant age effect?

bioconductor ALL linear model • 586 views
Entering edit mode
Last seen 1 hour ago
WEHI, Melbourne, Australia
age <- pData(ALL)$age
ALL2 <- ALL[, !]
design <- model.matrix(~age)
fit <- lmFit(ALL2, design)
fit <- eBayes(fit, trend=TRUE)


> topTable(fit, coef="age")
            logFC AveExpr     t  P.Value adj.P.Val      B
40419_at  0.02507    8.37  4.87 3.37e-06    0.0425  2.446
38639_at  0.01279    6.54  4.32 3.10e-05    0.1959  0.301
336_at    0.01208    6.70  4.06 8.61e-05    0.3623 -0.677
38994_at  0.03929    7.61  3.95 1.30e-04    0.3694 -1.073
33700_at  0.02601    6.17  3.87 1.78e-04    0.3694 -1.368
32406_at -0.00922    4.70 -3.85 1.91e-04    0.3694 -1.438
40202_at  0.03820    8.02  3.81 2.17e-04    0.3694 -1.556
34519_at -0.00457    3.83 -3.79 2.34e-04    0.3694 -1.630
39373_at -0.01423    3.91 -3.73 2.92e-04    0.4093 -1.839
38167_at -0.00542    4.86 -3.67 3.61e-04    0.4563 -2.041
Entering edit mode

Many thanks Gordon!  Let me try to break down what you've accomplished for my own understanding.  First, for the ! snippet - this is what's telling the code to create a variable (ALL2) which represents the entire (12625) matrix?  Next, for the design step, you did the opposite of what I did in my example above for one gene, that is you've converted the age data.frame into a matrix (whereas I converted the exprs(ALL) matrix into a data.frame).

I don't see how the exprs(ALL) data was incorporated - is this inherently part of the limma package? 

The topTable results provides a list of "Top Ten" probe_id's that were effected by age?  I'm also wondering if there is a simple command for "bottom table"?

Thanks again, I very much appreciate you taking the time to show me this example!

EDIT: as a solution for getting the bottom ten, I printed out all the rows to a csv file and just looked them up.  Not sure if this is the best way but it worked.

> topTable(fit, coef="age","P", n="Inf")

> limma.res <-topTable(fit, coef="age","P", n="Inf")

> write.csv(limma.res,file="B.PregVsLacResults.csv",row.names=TRUE)

Login before adding your answer.

Traffic: 232 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6