limma: advise needed on (nested?) design + interpretation of coefs
1
0
Entering edit mode
Guido Hooiveld ★ 3.9k
@guido-hooiveld-2020
Last seen 10 hours ago
Wageningen University, Wageningen, the …

Hi,

I would appreciate some feedback on how to best analyze my experiment.

Setup: 2 factors (age, diet) with each 4 levels. In total 80 arrays from mouse liver.

Research questions:

1) which genes are affected by age (regardless diet)? At which time?

2) which genes are affected by diet (regardless age)? With which diet?

3) which genes show age-diet interaction?

 

I was thinking of performing an ANOVA-like analysis in limma, but I face some challenges with respect to setup of the design matrix, interpretation of coefs, and post-hoc testing. I would appreciate some hints on how to best do this.

Thanks in advance,

Guido

 

> library(limma)
> #generate sample data (150 genes, 80 samples)
> data <- matrix(rnorm(80*150), 150)
> dim(data)
[1] 150  80
> #define 4 diet groups
> diet <- gl(n=4, k=5, length=80, labels = c("A", "B", "C", "D"))
> diet
[1] A A A A A B B B B B C C C C C D D D D D A A A A A B B B B B C C C C C D D D
[39] D D A A A A A B B B B B C C C C C D D D D D A A A A A B B B B B C C C C C D
[77] D D D D
Levels: A B C D

> #define 4 age groups
> age <- gl(n=4, k=20, length=80, labels= c("6m", "12m", "24m", "28m"))
> age
[1] 6m  6m  6m  6m  6m  6m  6m  6m  6m  6m  6m  6m  6m  6m  6m  6m  6m  6m  6m
[20] 6m  12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m 12m
[39] 12m 12m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m 24m
[58] 24m 24m 24m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m 28m
[77] 28m 28m 28m 28m
Levels: 6m 12m 24m 28m

> # setup model
> design <- model.matrix(~0+diet+age+diet:age)
> #fit model on data
> fit <- lmFit(data,design)
> fit2 <- eBayes(fit)
> topTable(fit2)

> topTable(fit2)
          dietA      dietB       dietC       dietD     age12m     age24m      age28m dietB.age12m
5   -0.30240402  0.5345652  0.19028497  0.45679917 -0.4091781  0.4822062 -0.26562727  -0.33466452
148  0.51694668 -0.0382692  0.22788650 -0.91171045 -0.7227267  0.4219781 -0.86774463  -0.75950749
63   0.77080071  0.3223913 -0.04903506 -1.20054429 -0.3516937 -0.2355396 -0.32732990  -0.35082948
24  -0.20266031  0.2749386 -0.02931008  0.39941753 -0.6602691  1.5677037  0.65454221   1.38768716
34  -0.28133599 -0.7619947  0.21812684 -0.01006036  0.1563202  0.9151784 -1.07782461   1.28485195
130 -0.07883237  0.6121587 -0.17856319 -0.84524491  0.5285410  0.8124192 -1.43798751  -0.92712993
52  -0.28771123  0.5399165  0.41040948  0.89498741  0.8692073  0.2766541 -0.60679816  -1.09110133
106 -0.68022520  0.4706411  0.84994880  0.55621723  1.4514266  1.2168773  0.74604696  -1.84925142
100  0.45797127  0.2133685  0.84292593 -0.01406238 -0.4969873 -1.0609057  0.07291669  -0.07017887
88   0.33736268  0.4491336  0.63686222  0.90392233 -0.5055769 -0.3652273  0.24785240  -0.56124159
    dietC.age12m dietD.age12m dietB.age24m dietC.age24m dietD.age24m dietB.age28m dietC.age28m dietD.age28m
5     0.75225373    0.4355970   -0.8887096  -0.22243994  -1.47133286   1.35646596   -0.9560272  -1.19534317
148   0.24903793    1.4629530   -0.4649639   0.39227833   0.49714530   0.84160044    0.2289131   1.95765820
63    1.03421148    1.5830664   -0.8846512  -0.36576882   0.76713718  -0.95637405    0.3763394   2.03196268
24    0.96753431    0.3404788   -1.8903184  -1.27654356  -2.63919501  -0.90390444    0.4917133  -0.96336637
34   -0.61195194    1.0936938   -0.4239752  -1.24893715  -0.60029014   2.27066908    0.6381282   0.53318713
130  -0.02154839    0.9181264   -1.6898110  -0.40120176   0.03231639   0.91776955    2.2014810   2.85570246
52   -1.19028501   -2.0980625    0.1386386  -1.16051574  -1.81134979  -0.64851551    0.8162746   0.08440981
106  -2.03444931   -1.7253034   -0.5993829  -2.30345014  -2.43271714  -1.32813118   -0.6837636  -1.17779016
100  -0.55084601    0.9136853    0.8167710  -0.09413691   1.93468365   0.16852678   -1.6465432  -1.24087852
88    0.15576194   -0.6455560   -1.0180201  -0.10342100  -0.36058652   0.02217662   -0.7845387  -0.17045263
        AveExpr        F     P.Value adj.P.Val
5    0.01389899 2.326438 0.002261782 0.3392673
148 -0.06809024 1.897208 0.017532180 0.7245234
63  -0.06554427 1.896692 0.017573013 0.7245234
24   0.22072101 1.875567 0.019320623 0.7245234
34  -0.02693657 1.821090 0.024597867 0.7379360
130  0.09597927 1.772757 0.030360397 0.7590099
52   0.08913469 1.702796 0.040900744 0.8055738
106  0.26934326 1.691054 0.042963936 0.8055738
100  0.01824946 1.582289 0.066986672 0.8889102
88   0.20946490 1.560833 0.072929099 0.8889102
>
> design
   dietA dietB dietC dietD age12m age24m age28m dietB:age12m dietC:age12m dietD:age12m dietB:age24m dietC:age24m dietD:age24m dietB:age28m dietC:age28m dietD:age28m
1      1     0     0     0      0      0      0            0            0            0            0            0            0            0            0            0
2      1     0     0     0      0      0      0            0            0            0            0            0            0            0            0            0
3      1     0     0     0      0      0      0            0            0            0            0            0            0            0            0            0
4      1     0     0     0      0      0      0            0            0            0            0            0            0            0            0            0
            0            0            0            0            0            0
<<snip>>
77     0     0     0     1      0      0      1            0            0            0            0            0            0            0            0            1
78     0     0     0     1      0      0      1            0            0            0            0            0            0            0            0            1
79     0     0     0     1      0      0      1            0            0            0            0            0            0            0            0            1
80     0     0     0     1      0      0      1            0            0            0            0            0            0            0            0            1
attr(,"assign")
[1] 1 1 1 1 2 2 2 3 3 3 3 3 3 3 3 3
attr(,"contrasts")
attr(,"contrasts")$diet
[1] "contr.treatment"

attr(,"contrasts")$age
[1] "contr.treatment"

>

> colnames(design)
[1] "dietA"        "dietB"        "dietC"        "dietD"        "age12m"       "age24m"       "age28m"       "dietB:age12m" "dietC:age12m" "dietD:age12m" "dietB:age24m" "dietC:age24m" "dietD:age24m"
[14] "dietB:age28m" "dietC:age28m" "dietD:age28m"
>

 

As far as I understand, the intercept equals "average expression with dietA @ all time points". Coefficient 2 corresponds to the regulation due to "dietB @ at any time point" versus "dietA @ all time points [intercept]", coef=3 to regulation due to "dietC @ at any time point", and coef=4 to regulation "due to dietD @ any time point".

If I would like to know which genes is regulated by any diet regardless age (Q2), should I then go for the ANOVA-like F-test returned by:??

> topTable(fit2, coef=1:4)
         dietA       dietB       dietC       dietD     AveExpr        F    P.Value adj.P.Val
143 -0.7910891 -0.78164517 -1.02447250  0.58613632 -0.17618218 3.308292 0.01055090 0.7568876
39  -0.9011106  0.35661965  1.04680974 -0.36164077  0.04613054 2.805475 0.02475872 0.7568876
63   0.7708007  0.32239133 -0.04903506 -1.20054429 -0.06554427 2.693608 0.02985375 0.7568876
145 -1.2371483 -0.50051088  0.47260724  0.11668265  0.06936884 2.559882 0.03728271 0.7568876
110  0.5629061 -0.40476536  1.16794608 -0.22988856  0.01254860 2.478288 0.04266014 0.7568876
71  -0.8500340 -0.10600863  1.01138455  0.41642806  0.03656495 2.462488 0.04378431 0.7568876
98  -0.1647273  0.01475745 -1.37125467  0.07573386 -0.08310678 2.444641 0.04508842 0.7568876
97  -1.1720374  0.17092449 -0.69304774  0.04033633 -0.07467623 2.441513 0.04532078 0.7568876
94  -0.7154630 -0.26528302 -1.05981806 -0.43671377 -0.17417866 2.440272 0.04541326 0.7568876
56   0.6220429  0.53225449 -0.75835250 -0.69068133 -0.02745961 2.255640 0.06139060 0.7655283
>

In other words, based on the F-test p-value of these 4 coefficients I will identify the genes that are regulated by any diet at any time point, and using decideTests() would indicate between which specific diet(s)?

 

But how then to do this to identify the overall time effect? I am asking because the coefficient "age6m"  is missing... Or should I define another model?

And how to define e.g. the coefficient "dietB:age6m"? Or is that equal to coef=2 ("dietB") minus coef 8, 11 and 14?

 

 

 

 

 

limm multiple factor design limma design matrix • 1.1k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 24 minutes ago
WEHI, Melbourne, Australia

It is good that you have formulated your research questions. Finding genes that show age-diet interaction is easy, and I assume you already know how to do that.

The reason you are having trouble answering the other two questions is that the questions you have asked don't have answers. It is easy find genes that are affected by age for each diet. But, if the genes that are DE with respect to age depend on diet, how can you find genes affected by age regardless of diet? It isn't a meaningful question to ask.

Similarly, you can easily find genes affected by diet for each age. But you can't find genes affected by diet regardless of of age, because age may change the effects of diet.

As an aside, the only situation in which it is useful to include "0+" in the model formula is when treating the experiment as a oneway layout. It doesn't help in a factorial model formula as you have defined.

As another aside, I have long tried to disuade people from using factorial model formulas to analyse microarray experiments, because such models don't answer the questions people usually want to answer.

ADD COMMENT

Login before adding your answer.

Traffic: 790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6