setting contrast in edgeR for time series data
1
0
Entering edit mode
@wangzhang1988-13198
Last seen 4.2 years ago

Hello,

I'm analyzing a time series RNA-Seq data with repeated measures on six different time points corresponding to pre-treatment, on-treatment and post-treatment phases, using edgeR:

Timepoint1          Pretreat

Timepoint2          Pretreat

Timepoint3          Pretreat

Timepoint4          Ontreat

Timepoint5          Ontreat

Timepoint6          Posttreat

And my hypothesis test is to look for DEGs comparing Ontreat VS Pretreat, and Posttreat VS Ontreat.

There are two options I can think of to do this:

1). Include the timepoint variable into the glm model (include subject as well since it's repeated measure data) and setting the contrast as:

design<-model.matrix(~0+timepoint+subject)

mycontrast<-makeContrasts(OnvsPre=(timepoint4+timepoint5)/2-(timepoint1+timepoint2+timepoint3)/3, PostvsOn=timepoint6-(timepoint4+timepoint5)/2, levels=design)

2). Include the treatment phase variable into the model (which essentially combines different timepoints within the same treatment into one group):

design<-model.matrix(~0+treatment+subject)
mycontrast<-makeContrasts(OnvsPre=Ontreat-Pretreat,PostvsOn=Posttreat-OnTreat,levels=design)


Since I am really new to RNA-Seq analysis and ignorant in statistics, my questions are:

1). For the first method, am I setting the contrast in the right way?

2). For the second method, is it justified to combine different timepoints into one group, or will it fall into the issue of repeated measures?

Thanks very much for any help here!


rnaseq edger makecontrasts • 1.0k views
0
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 9 hours ago
The city by the bay

Use the first model. If there are differences between time points in the same treatment category, this will lead to inflation of the dispersion estimates when you try to treat those time points as "replicates" in the second model.

The contrasts for the first model look fine to me.

0
Entering edit mode

Thanks Aaron. Much appreciated.

0
Entering edit mode

To interpret the first model, does it mean that the model essentially takes the averaged expression values of the three timepoints of pre-treatment, and takes the average of the two timepoints of on-treatment, and then obtains the DE genes from comparing the averaged expression scores?

Thanks again.

0
Entering edit mode

Yes. If you need more stringency, you can test each pair of on-treatment vs pre-treatment timepoints to verify that genes are consistently DE (in the same direction) between groups.