Hi,
I would appreciate some feedback on how to best analyze my experiment.
Setup: 2 factors (age, diet) with each 4 levels. In total 80 arrays from mouse liver.
Research questions:
1) which genes are affected by age (regardless diet)? At which time?
2) which genes are affected by diet (regardless age)? With which diet?
3) which genes show age-diet interaction?
I was thinking of performing an ANOVA-like analysis in limma, but I face some challenges with respect to setup of the design matrix, interpretation of coefs, and post-hoc testing. I would appreciate some hints on how to best do this.
Thanks in advance,
Guido
> library(limma)
> #generate sample data (150 genes, 80 samples)
> data <- matrix(rnorm(80*150), 150)
> dim(data)
[1] 150 80
> #define 4 diet groups
> diet <- gl(n=4, k=5, length=80, labels = c("A", "B", "C", "D"))
> diet
[1] A A A A A B B B B B C C C C C D D D D D A A A A A B B B B B C C C C C D D D
[39] D D A A A A A B B B B B C C C C C D D D D D A A A A A B B B B B C C C C C D
[77] D D D D
Levels: A B C D
> #define 4 age groups
> age <- gl(n=4, k=20, length=80, labels= c("6m", "12m", "24m", "28m"))
> age
[1] 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m
[20] 6m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m
[39] 12m 12m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m
[58] 24m 24m 24m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m
[77] 28m 28m 28m 28m
Levels: 6m 12m 24m 28m
> # setup model
> design <- model.matrix(~0+diet+age+diet:age)
> #fit model on data
> fit <- lmFit(data,design)
> fit2 <- eBayes(fit)
> topTable(fit2)
> topTable(fit2)
dietA dietB dietC dietD age12m age24m age28m dietB.age12m
5 -0.30240402 0.5345652 0.19028497 0.45679917 -0.4091781 0.4822062 -0.26562727 -0.33466452
148 0.51694668 -0.0382692 0.22788650 -0.91171045 -0.7227267 0.4219781 -0.86774463 -0.75950749
63 0.77080071 0.3223913 -0.04903506 -1.20054429 -0.3516937 -0.2355396 -0.32732990 -0.35082948
24 -0.20266031 0.2749386 -0.02931008 0.39941753 -0.6602691 1.5677037 0.65454221 1.38768716
34 -0.28133599 -0.7619947 0.21812684 -0.01006036 0.1563202 0.9151784 -1.07782461 1.28485195
130 -0.07883237 0.6121587 -0.17856319 -0.84524491 0.5285410 0.8124192 -1.43798751 -0.92712993
52 -0.28771123 0.5399165 0.41040948 0.89498741 0.8692073 0.2766541 -0.60679816 -1.09110133
106 -0.68022520 0.4706411 0.84994880 0.55621723 1.4514266 1.2168773 0.74604696 -1.84925142
100 0.45797127 0.2133685 0.84292593 -0.01406238 -0.4969873 -1.0609057 0.07291669 -0.07017887
88 0.33736268 0.4491336 0.63686222 0.90392233 -0.5055769 -0.3652273 0.24785240 -0.56124159
dietC.age12m dietD.age12m dietB.age24m dietC.age24m dietD.age24m dietB.age28m dietC.age28m dietD.age28m
5 0.75225373 0.4355970 -0.8887096 -0.22243994 -1.47133286 1.35646596 -0.9560272 -1.19534317
148 0.24903793 1.4629530 -0.4649639 0.39227833 0.49714530 0.84160044 0.2289131 1.95765820
63 1.03421148 1.5830664 -0.8846512 -0.36576882 0.76713718 -0.95637405 0.3763394 2.03196268
24 0.96753431 0.3404788 -1.8903184 -1.27654356 -2.63919501 -0.90390444 0.4917133 -0.96336637
34 -0.61195194 1.0936938 -0.4239752 -1.24893715 -0.60029014 2.27066908 0.6381282 0.53318713
130 -0.02154839 0.9181264 -1.6898110 -0.40120176 0.03231639 0.91776955 2.2014810 2.85570246
52 -1.19028501 -2.0980625 0.1386386 -1.16051574 -1.81134979 -0.64851551 0.8162746 0.08440981
106 -2.03444931 -1.7253034 -0.5993829 -2.30345014 -2.43271714 -1.32813118 -0.6837636 -1.17779016
100 -0.55084601 0.9136853 0.8167710 -0.09413691 1.93468365 0.16852678 -1.6465432 -1.24087852
88 0.15576194 -0.6455560 -1.0180201 -0.10342100 -0.36058652 0.02217662 -0.7845387 -0.17045263
AveExpr F P.Value adj.P.Val
5 0.01389899 2.326438 0.002261782 0.3392673
148 -0.06809024 1.897208 0.017532180 0.7245234
63 -0.06554427 1.896692 0.017573013 0.7245234
24 0.22072101 1.875567 0.019320623 0.7245234
34 -0.02693657 1.821090 0.024597867 0.7379360
130 0.09597927 1.772757 0.030360397 0.7590099
52 0.08913469 1.702796 0.040900744 0.8055738
106 0.26934326 1.691054 0.042963936 0.8055738
100 0.01824946 1.582289 0.066986672 0.8889102
88 0.20946490 1.560833 0.072929099 0.8889102
>
> design
dietA dietB dietC dietD age12m age24m age28m dietB:age12m dietC:age12m dietD:age12m dietB:age24m dietC:age24m dietD:age24m dietB:age28m dietC:age28m dietD:age28m
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0
<<snip>>
77 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1
78 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1
79 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1
80 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1
attr(,"assign")
[1] 1 1 1 1 2 2 2 3 3 3 3 3 3 3 3 3
attr(,"contrasts")
attr(,"contrasts")$diet
[1] "contr.treatment"
attr(,"contrasts")$age
[1] "contr.treatment"
>
> colnames(design)
[1] "dietA" "dietB" "dietC" "dietD" "age12m" "age24m" "age28m" "dietB:age12m" "dietC:age12m" "dietD:age12m" "dietB:age24m" "dietC:age24m" "dietD:age24m"
[14] "dietB:age28m" "dietC:age28m" "dietD:age28m"
>
As far as I understand, the intercept equals "average expression with dietA @ all time points". Coefficient 2 corresponds to the regulation due to "dietB @ at any time point" versus "dietA @ all time points [intercept]", coef=3 to regulation due to "dietC @ at any time point", and coef=4 to regulation "due to dietD @ any time point".
If I would like to know which genes is regulated by any diet regardless age (Q2), should I then go for the ANOVA-like F-test returned by:??
> topTable(fit2, coef=1:4)
dietA dietB dietC dietD AveExpr F P.Value adj.P.Val
143 -0.7910891 -0.78164517 -1.02447250 0.58613632 -0.17618218 3.308292 0.01055090 0.7568876
39 -0.9011106 0.35661965 1.04680974 -0.36164077 0.04613054 2.805475 0.02475872 0.7568876
63 0.7708007 0.32239133 -0.04903506 -1.20054429 -0.06554427 2.693608 0.02985375 0.7568876
145 -1.2371483 -0.50051088 0.47260724 0.11668265 0.06936884 2.559882 0.03728271 0.7568876
110 0.5629061 -0.40476536 1.16794608 -0.22988856 0.01254860 2.478288 0.04266014 0.7568876
71 -0.8500340 -0.10600863 1.01138455 0.41642806 0.03656495 2.462488 0.04378431 0.7568876
98 -0.1647273 0.01475745 -1.37125467 0.07573386 -0.08310678 2.444641 0.04508842 0.7568876
97 -1.1720374 0.17092449 -0.69304774 0.04033633 -0.07467623 2.441513 0.04532078 0.7568876
94 -0.7154630 -0.26528302 -1.05981806 -0.43671377 -0.17417866 2.440272 0.04541326 0.7568876
56 0.6220429 0.53225449 -0.75835250 -0.69068133 -0.02745961 2.255640 0.06139060 0.7655283
>
In other words, based on the F-test p-value of these 4 coefficients I will identify the genes that are regulated by any diet at any time point, and using decideTests() would indicate between which specific diet(s)?
But how then to do this to identify the overall time effect? I am asking because the coefficient "age6m" is missing... Or should I define another model?
And how to define e.g. the coefficient "dietB:age6m"? Or is that equal to coef=2 ("dietB") minus coef 8, 11 and 14?
