Design matrix choice for time series experiment using limma
1
0
Entering edit mode
rm1238 • 0
@rm1238-19204
Last seen 2.1 years ago

Hello,

I have performed time-series experiments (3 biological replicates) in human cells. We want to identify differentially regulated genes at any time point relative to the T0 time point. I am using splines to model the data (using scripts developed by a former colleague).

>My present design matrix is as follows:

# design_full <- data.frame(Treatment = gsub("_.*", "", colnames(data_rma_no_batch)),
                             Time = rep(c(0,1,2,3,4,5,6), 3),
                             Batch = factor(rep(seq(from = 1, to = 3), each = 7)))

  Treatment Time Batch
1        T1    0     1
2        T1    1     1
3        T1    2     1
4        T1    3     1
5        T1    4     1
6        T1    5     1
7        T1    6     1
8        T1    0     2
9        T1    1     2
10       T1    2     2
11       T1    3     2
12       T1    4     2
13       T1    5     2
14       T1    6     2
15       T1    0     3
16       T1    1     3
17       T1    2     3
18       T1    3     3
19       T1    4     3
20       T1    5     3
21       T1    6     3

>Then I use the following code for the splines:

#design_splines_full <-  ns(design_full$Time, df = 3)

> which gives what I think the expected numbers in that the 0.000 value always falls within my T0 reference:

                1                2               3
 [1,]  0.00000000 0.0000000  0.0000000
 [2,] -0.09686347 0.3530904 -0.2353936
 [3,] -0.01426895 0.5428068 -0.3618712
 [4,]  0.32047336 0.4760799 -0.2965533
 [5,]  0.55298038 0.3410589 -0.0607059
 [6,]  0.34590826 0.3372752  0.2959832
 [7,] -0.14285714 0.4285714  0.7142857
 [8,]  0.00000000 0.0000000  0.0000000
 [9,] -0.09686347 0.3530904 -0.2353936
[10,] -0.01426895 0.5428068 -0.3618712
[11,]  0.32047336 0.4760799 -0.2965533
[12,]  0.55298038 0.3410589 -0.0607059
[13,]  0.34590826 0.3372752  0.2959832
[14,] -0.14285714 0.4285714  0.7142857
[15,]  0.00000000 0.0000000  0.0000000
[16,] -0.09686347 0.3530904 -0.2353936
[17,] -0.01426895 0.5428068 -0.3618712
[18,]  0.32047336 0.4760799 -0.2965533
[19,]  0.55298038 0.3410589 -0.0607059
[20,]  0.34590826 0.3372752  0.2959832
[21,] -0.14285714 0.4285714  0.7142857
attr(,"degree")
[1] 3
attr(,"knots")
33.33333% 66.66667% 
        2         4 
attr(,"Boundary.knots")
[1] 0 6
attr(,"intercept")
[1] FALSE
attr(,"class")
[1] "ns"     "basis"  "matrix"

> However, when I try to create a design to test the effect of time only, the following is returned:

# design_limma_time <- model.matrix(~ design_full$Treatment + design_splines_full)

  Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

>I suspect this is something to do with my original matrix, and in particular with how the reference time point T0 is represented in the matrix.

I am more of a wet lab person and my experience with R is a little less than moderate. I apologise in advance for the naive question, but any help here would be greatly appreciated.

 

Kind regards,

Iván

 

 

limma time series help • 346 views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 5 hours ago
The city by the bay

It does not make sense to use Treatment in your design matrix. You only have one unique value (i.e., one level) in Treatment; there's no point including it in the design matrix, because there's nothing to compare it to. Rather, it makes more sense to put Batch in your design matrix as an additive factor. Presumably this refers to experimental batches or runs of the time course, and you should block on this to avoid unwanted variation.

Keep in mind that, if you're using splines, you should always be testing all of the spline coefficients together in topTable, e.g., coef=2:4 assuming the second to fourth coefficients are the spline terms. All spline coefficients are used to construct the fitted trend, so by setting all of them to zero with coef=, the null hypothesis is that there is no trend. (Note that testing individual spline coefficients does not make sense in the vast majority of cases.) This is another way of formulating your "any difference from zero time" question; if there were a trend, then at least one other time point would have different expression from that at zero time.

I should also point out that another valid formulation for your design matrix would be to convert Time into a factor and use ~Time + Batch. You can then do an ANOVA to test for any difference between the time points. This avoids the assumption of a smooth change in expression over time (which is necessary for the spline to work), at the cost of losing some residual d.f. and thus power.

ADD COMMENT
0
Entering edit mode

Hello Aaron,

Thank you for your detailed answer. Indeed, the batches are three different biological replicates of the same experiment. I will be working on your suggestions.

 

Thank you again

ADD REPLY

Login before adding your answer.

Traffic: 224 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6