Question

Repeated Measures, Within-Subject, Between-Groups RNAseq Analysis

0

Entering edit mode

wjh180 • 0

@wjh180-15427

Last seen 7.0 years ago

Hello All,

I'm currently working through an analysis of 96 RNAseq samples generated from a rather complicated experimental design and need some help constructing my design matrices. Briefly, the experiment involves repeated measures of 12 individuals across two experimental conditions one week apart (No Stress and Stress). Furthermore, individuals are split into two groups, "Risk" or "Control. So factors are as follows:

Participant: 12 levels (1-12); Status: 2 levels [Risk (N=6) or Control (N=6)]; Session: 2 levels (No_Stress or Stress)

Time: 4 levels (1-4, sample times are the same in both sessions, e.g. No_Stress.1 = Stress.1)

This is summarized in the table below.

Participant	Status	Session	Time	Group
1	Control	No_Stress	1	No_Stress.1
1	Control	No_Stress	2	No_Stress.2
1	Control	No_Stress	3	No_Stress.3
1	Control	No_Stress	4	No_Stress.4
1	Control	Stress	1	Stress.1
1	Control	Stress	2	Stress.2
1	Control	Stress	3	Stress.3
1	Control	Stress	4	Stress.4
...
12	Risk	No_Stress	1	No_Stress.1
12	Risk	No_Stress	2	No_Stress.2
12	Risk	No_Stress	3	No_Stress.3
12	Risk	No_Stress	4	No_Stress.
12	Risk	Stress	1	Stress.1
12	Risk	Stress	2	Stress.2
12	Risk	Stress	3	Stress.3
12	Risk	Stress	4	Stress.4

I'm interested in the following hypotheses:

i) Which genes are responsive to time (i.e. circadian)?

ii) Which genes are responsive to stress?

iii) Which genes are deferentially responsive to stress as a function of status (Risk versus Control)?

FILTERING AND NORMALIZATION STEPS:

X <- DGEList(counts=counts, genes=rownames(counts))
keep <- rowSums(cpm(X)>=1) >=2
X_filtered <- X[keep,keep.lib.sizes=FALSE]
X_filter_norm <- calcNormFactors(X_filtered)
X_filter_norm <- estimateGLMCommonDisp(X_filter_norm, method="deviance",robust=TRUE,subset=NULL)

I have no true replicates so I used "estimateGLMCommonDisp" with the settings above as was recommended in Section 2.11 of the edgeR User Guide.

QUESTION 1: Is there any preference to using the combined factor "Group" versus nesting Time within Session.

In other words is

time_matrix1 <- model.matrix(~0+Group)

preferred over

time_matrix2 <- model.matrix(~0+Session+Session:Time)

For Questions 2 & 3 I'm assuming using Group is best, but if not I would replace it with "Session+Session:Time" accordingly below.

QUESTION 2: I don't necessarily care about differences at the participant level, but I would like to control for them. Does my inclusion of Participant as a factor in my design matrix take care of this? For example is the model below sufficient to tests hypotheses i and ii?

time_matrix3 <- model.matrix (~0+Group+Participant)
fit <- glmFit(X_filter_norm, time_matrix3)

Or do I need to nest the combined factor within Participant?

time_matrix4 <- model.matrix(~0+Participant+Participant:Group)

QUESTION 3: Is the fully nested model below appropriate for hypothesis iii?

full_matrix <- model.matrix(~Status + Status:Participant + Status:Participant:Group)

Or can I removed the Participant factor and have the matrix below instead?

full_matrix <- model.matrix(~Status + Status:Group)

ANY help or guidance would be much appreciated! I've gotten this far by piecing together different strategies I've read on other posts and in user manuals, but I still haven't found a solid answer or example of a design this complicated.

Thanks!

edgeR rnaseq repeated measures • 2.1k views

ADD COMMENT • link updated 7.8 years ago by Ryan C. Thompson ★ 7.9k • written 7.8 years ago by wjh180 • 0

score 3 · Answer 1 · 2018-04-09

For Question 1, the two designs are equivalent, and you should use whichever one you find more convenient.

For Question 2, I would go with your first design, ~0+Group+Participant. The second design you suggest would fit a separate group effect for every participant, which you have stated is not your goal (and furthermore this design would have as many coefficients as samples and therefore be impossible to fit).

For Question 3, you cannot include both Participant and Status as fixed effects in the same design, because the first is nested within the second. However, omitting Participant entirely is not ideal either. The recommended way to handle this design is to use limma's duplicateCorrelation to fit a mixed model with Participant as a random effect. You'll have to switch from edgeR to voom and limma to do this. (If you want to do this, I recommend you switch the entire analysis to limma so that you can compare results consistently between different models.)

Lastly, there's no reason for you to use estimateGLMCommonDisp. Just use estimateDisp. You do have replicates: you have multiple participants in each status group. Generally, "no replicates" means that your design matrix has as many columns as you have samples, so your model has zero residual degrees of freedom. That's not the case for you unless you want to use time_matrix4, which I recommended against.