Hi,
I would appreciate some feedback on how to best analyze my experiment.
Setup: 2 factors (age, diet) with each 4 levels. In total 80 arrays from mouse liver.
Research questions:
1) which genes are affected by age (regardless diet)? At which time?
2) which genes are affected by diet (regardless age)? With which diet?
3) which genes show age-diet interaction?
I was thinking of performing an ANOVA-like analysis in limma, but I face some challenges with respect to setup of the design matrix, interpretation of coefs, and post-hoc testing. I would appreciate some hints on how to best do this.
Thanks in advance,
Guido
> library(limma) > #generate sample data (150 genes, 80 samples) > data <- matrix(rnorm(80*150), 150) > dim(data) [1] 150 80 > #define 4 diet groups > diet <- gl(n=4, k=5, length=80, labels = c("A", "B", "C", "D")) > diet [1] A A A A A B B B B B C C C C C D D D D D A A A A A B B B B B C C C C C D D D [39] D D A A A A A B B B B B C C C C C D D D D D A A A A A B B B B B C C C C C D [77] D D D D Levels: A B C D > #define 4 age groups > age <- gl(n=4, k=20, length=80, labels= c("6m", "12m", "24m", "28m")) > age [1] 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m 6m [20] 6m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m [39] 12m 12m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m [58] 24m 24m 24m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m [77] 28m 28m 28m 28m Levels: 6m 12m 24m 28m > # setup model > design <- model.matrix(~0+diet+age+diet:age) > #fit model on data > fit <- lmFit(data,design) > fit2 <- eBayes(fit) > topTable(fit2) > topTable(fit2) dietA dietB dietC dietD age12m age24m age28m dietB.age12m 5 -0.30240402 0.5345652 0.19028497 0.45679917 -0.4091781 0.4822062 -0.26562727 -0.33466452 148 0.51694668 -0.0382692 0.22788650 -0.91171045 -0.7227267 0.4219781 -0.86774463 -0.75950749 63 0.77080071 0.3223913 -0.04903506 -1.20054429 -0.3516937 -0.2355396 -0.32732990 -0.35082948 24 -0.20266031 0.2749386 -0.02931008 0.39941753 -0.6602691 1.5677037 0.65454221 1.38768716 34 -0.28133599 -0.7619947 0.21812684 -0.01006036 0.1563202 0.9151784 -1.07782461 1.28485195 130 -0.07883237 0.6121587 -0.17856319 -0.84524491 0.5285410 0.8124192 -1.43798751 -0.92712993 52 -0.28771123 0.5399165 0.41040948 0.89498741 0.8692073 0.2766541 -0.60679816 -1.09110133 106 -0.68022520 0.4706411 0.84994880 0.55621723 1.4514266 1.2168773 0.74604696 -1.84925142 100 0.45797127 0.2133685 0.84292593 -0.01406238 -0.4969873 -1.0609057 0.07291669 -0.07017887 88 0.33736268 0.4491336 0.63686222 0.90392233 -0.5055769 -0.3652273 0.24785240 -0.56124159 dietC.age12m dietD.age12m dietB.age24m dietC.age24m dietD.age24m dietB.age28m dietC.age28m dietD.age28m 5 0.75225373 0.4355970 -0.8887096 -0.22243994 -1.47133286 1.35646596 -0.9560272 -1.19534317 148 0.24903793 1.4629530 -0.4649639 0.39227833 0.49714530 0.84160044 0.2289131 1.95765820 63 1.03421148 1.5830664 -0.8846512 -0.36576882 0.76713718 -0.95637405 0.3763394 2.03196268 24 0.96753431 0.3404788 -1.8903184 -1.27654356 -2.63919501 -0.90390444 0.4917133 -0.96336637 34 -0.61195194 1.0936938 -0.4239752 -1.24893715 -0.60029014 2.27066908 0.6381282 0.53318713 130 -0.02154839 0.9181264 -1.6898110 -0.40120176 0.03231639 0.91776955 2.2014810 2.85570246 52 -1.19028501 -2.0980625 0.1386386 -1.16051574 -1.81134979 -0.64851551 0.8162746 0.08440981 106 -2.03444931 -1.7253034 -0.5993829 -2.30345014 -2.43271714 -1.32813118 -0.6837636 -1.17779016 100 -0.55084601 0.9136853 0.8167710 -0.09413691 1.93468365 0.16852678 -1.6465432 -1.24087852 88 0.15576194 -0.6455560 -1.0180201 -0.10342100 -0.36058652 0.02217662 -0.7845387 -0.17045263 AveExpr F P.Value adj.P.Val 5 0.01389899 2.326438 0.002261782 0.3392673 148 -0.06809024 1.897208 0.017532180 0.7245234 63 -0.06554427 1.896692 0.017573013 0.7245234 24 0.22072101 1.875567 0.019320623 0.7245234 34 -0.02693657 1.821090 0.024597867 0.7379360 130 0.09597927 1.772757 0.030360397 0.7590099 52 0.08913469 1.702796 0.040900744 0.8055738 106 0.26934326 1.691054 0.042963936 0.8055738 100 0.01824946 1.582289 0.066986672 0.8889102 88 0.20946490 1.560833 0.072929099 0.8889102 >
> design dietA dietB dietC dietD age12m age24m age28m dietB:age12m dietC:age12m dietD:age12m dietB:age24m dietC:age24m dietD:age24m dietB:age28m dietC:age28m dietD:age28m 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 <<snip>> 77 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 78 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 79 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 80 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 attr(,"assign") [1] 1 1 1 1 2 2 2 3 3 3 3 3 3 3 3 3 attr(,"contrasts") attr(,"contrasts")$diet [1] "contr.treatment" attr(,"contrasts")$age [1] "contr.treatment" > > colnames(design) [1] "dietA" "dietB" "dietC" "dietD" "age12m" "age24m" "age28m" "dietB:age12m" "dietC:age12m" "dietD:age12m" "dietB:age24m" "dietC:age24m" "dietD:age24m" [14] "dietB:age28m" "dietC:age28m" "dietD:age28m" >
As far as I understand, the intercept equals "average expression with dietA @ all time points". Coefficient 2 corresponds to the regulation due to "dietB @ at any time point" versus "dietA @ all time points [intercept]", coef=3 to regulation due to "dietC @ at any time point", and coef=4 to regulation "due to dietD @ any time point".
If I would like to know which genes is regulated by any diet regardless age (Q2), should I then go for the ANOVA-like F-test returned by:??
> topTable(fit2, coef=1:4) dietA dietB dietC dietD AveExpr F P.Value adj.P.Val 143 -0.7910891 -0.78164517 -1.02447250 0.58613632 -0.17618218 3.308292 0.01055090 0.7568876 39 -0.9011106 0.35661965 1.04680974 -0.36164077 0.04613054 2.805475 0.02475872 0.7568876 63 0.7708007 0.32239133 -0.04903506 -1.20054429 -0.06554427 2.693608 0.02985375 0.7568876 145 -1.2371483 -0.50051088 0.47260724 0.11668265 0.06936884 2.559882 0.03728271 0.7568876 110 0.5629061 -0.40476536 1.16794608 -0.22988856 0.01254860 2.478288 0.04266014 0.7568876 71 -0.8500340 -0.10600863 1.01138455 0.41642806 0.03656495 2.462488 0.04378431 0.7568876 98 -0.1647273 0.01475745 -1.37125467 0.07573386 -0.08310678 2.444641 0.04508842 0.7568876 97 -1.1720374 0.17092449 -0.69304774 0.04033633 -0.07467623 2.441513 0.04532078 0.7568876 94 -0.7154630 -0.26528302 -1.05981806 -0.43671377 -0.17417866 2.440272 0.04541326 0.7568876 56 0.6220429 0.53225449 -0.75835250 -0.69068133 -0.02745961 2.255640 0.06139060 0.7655283 >
In other words, based on the F-test p-value of these 4 coefficients I will identify the genes that are regulated by any diet at any time point, and using decideTests()
would indicate between which specific diet(s)?
But how then to do this to identify the overall time effect? I am asking because the coefficient "age6m" is missing... Or should I define another model?
And how to define e.g. the coefficient "dietB:age6m"? Or is that equal to coef=2 ("dietB") minus coef 8, 11 and 14?