**0**wrote:

Hello All,

I'm currently working through an analysis of 96 RNAseq samples generated from a rather complicated experimental design and need some help constructing my design matrices. Briefly, the experiment involves repeated measures of 12 individuals across two experimental conditions one week apart (No Stress and Stress). Furthermore, individuals are split into two groups, "Risk" or "Control. So factors are as follows:

**Participant**: 12 levels (1-12); **Status**: 2 levels [Risk (N=6) or Control (N=6)]; **Session**: 2 levels (No_Stress or Stress)

**Time**: 4 levels (1-4, sample times are the same in both sessions, e.g. No_Stress.1 = Stress.1)

This is summarized in the table below.

Participant | Status | Session | Time | Group |
---|---|---|---|---|

1 | Control | No_Stress | 1 | No_Stress.1 |

1 | Control | No_Stress | 2 | No_Stress.2 |

1 | Control | No_Stress | 3 | No_Stress.3 |

1 | Control | No_Stress | 4 | No_Stress.4 |

1 | Control | Stress | 1 | Stress.1 |

1 | Control | Stress | 2 | Stress.2 |

1 | Control | Stress | 3 | Stress.3 |

1 | Control | Stress | 4 | Stress.4 |

... |
||||

12 | Risk | No_Stress | 1 | No_Stress.1 |

12 | Risk | No_Stress | 2 | No_Stress.2 |

12 | Risk | No_Stress | 3 | No_Stress.3 |

12 | Risk | No_Stress | 4 | No_Stress. |

12 | Risk | Stress | 1 | Stress.1 |

12 | Risk | Stress | 2 | Stress.2 |

12 | Risk | Stress | 3 | Stress.3 |

12 | Risk | Stress | 4 | Stress.4 |

I'm interested in the following hypotheses:

i) *Which genes are responsive to time (i.e. circadian)?*

ii) *Which genes are responsive to stress? *

iii) *Which genes are deferentially responsive to stress as a function of status (Risk versus Control)?*

**FILTERING AND NORMALIZATION STEPS:**

X <- DGEList(counts=counts, genes=rownames(counts)) keep <- rowSums(cpm(X)>=1) >=2 X_filtered <- X[keep,keep.lib.sizes=FALSE] X_filter_norm <- calcNormFactors(X_filtered) X_filter_norm <- estimateGLMCommonDisp(X_filter_norm, method="deviance",robust=TRUE,subset=NULL)

I have no true replicates so I used "estimateGLMCommonDisp" with the settings above as was recommended in Section 2.11 of the edgeR User Guide.

**QUESTION 1: **Is there any preference to using the combined factor "Group" versus nesting Time within Session.

In other words is

time_matrix1 <- model.matrix(~0+Group)

preferred over

time_matrix2 <- model.matrix(~0+Session+Session:Time)

*For Questions 2 & 3 I'm assuming using Group is best, but if not I would replace it with "Session+Session:Time" accordingly below.*

**QUESTION 2**: I don't necessarily care about differences at the participant level, but I would like to control for them. Does my inclusion of Participant as a factor in my design matrix take care of this? For example is the model below sufficient to tests hypotheses i and ii?

time_matrix3 <- model.matrix (~0+Group+Participant) fit <- glmFit(X_filter_norm, time_matrix3)

Or do I need to nest the combined factor within Participant?

time_matrix4 <- model.matrix(~0+Participant+Participant:Group)

**QUESTION 3: **Is the fully nested model below appropriate for hypothesis iii?

full_matrix <- model.matrix(~Status + Status:Participant + Status:Participant:Group)

Or can I removed the Participant factor and have the matrix below instead?

full_matrix <- model.matrix(~Status + Status:Group)

ANY help or guidance would be much appreciated! I've gotten this far by piecing together different strategies I've read on other posts and in user manuals, but I still haven't found a solid answer or example of a design this complicated.

Thanks!

**6.7k**• written 6 weeks ago by wjh180 •

**0**