Design matrix choice for time series experiment using limma
1
0
Entering edit mode
rm1238 • 0
@rm1238-19204
Last seen 3.0 years ago

Hello,

I have performed time-series experiments (3 biological replicates) in human cells. We want to identify differentially regulated genes at any time point relative to the T0 time point. I am using splines to model the data (using scripts developed by a former colleague).

>My present design matrix is as follows:

# design_full <- data.frame(Treatment = gsub("_.*", "", colnames(data_rma_no_batch)),
Time = rep(c(0,1,2,3,4,5,6), 3),
Batch = factor(rep(seq(from = 1, to = 3), each = 7)))

Treatment Time Batch
1        T1    0     1
2        T1    1     1
3        T1    2     1
4        T1    3     1
5        T1    4     1
6        T1    5     1
7        T1    6     1
8        T1    0     2
9        T1    1     2
10       T1    2     2
11       T1    3     2
12       T1    4     2
13       T1    5     2
14       T1    6     2
15       T1    0     3
16       T1    1     3
17       T1    2     3
18       T1    3     3
19       T1    4     3
20       T1    5     3
21       T1    6     3

>Then I use the following code for the splines:

#design_splines_full <-  ns(design_full$Time, df = 3) > which gives what I think the expected numbers in that the 0.000 value always falls within my T0 reference: 1 2 3 [1,] 0.00000000 0.0000000 0.0000000 [2,] -0.09686347 0.3530904 -0.2353936 [3,] -0.01426895 0.5428068 -0.3618712 [4,] 0.32047336 0.4760799 -0.2965533 [5,] 0.55298038 0.3410589 -0.0607059 [6,] 0.34590826 0.3372752 0.2959832 [7,] -0.14285714 0.4285714 0.7142857 [8,] 0.00000000 0.0000000 0.0000000 [9,] -0.09686347 0.3530904 -0.2353936 [10,] -0.01426895 0.5428068 -0.3618712 [11,] 0.32047336 0.4760799 -0.2965533 [12,] 0.55298038 0.3410589 -0.0607059 [13,] 0.34590826 0.3372752 0.2959832 [14,] -0.14285714 0.4285714 0.7142857 [15,] 0.00000000 0.0000000 0.0000000 [16,] -0.09686347 0.3530904 -0.2353936 [17,] -0.01426895 0.5428068 -0.3618712 [18,] 0.32047336 0.4760799 -0.2965533 [19,] 0.55298038 0.3410589 -0.0607059 [20,] 0.34590826 0.3372752 0.2959832 [21,] -0.14285714 0.4285714 0.7142857 attr(,"degree") [1] 3 attr(,"knots") 33.33333% 66.66667% 2 4 attr(,"Boundary.knots") [1] 0 6 attr(,"intercept") [1] FALSE attr(,"class") [1] "ns" "basis" "matrix" > However, when I try to create a design to test the effect of time only, the following is returned: # design_limma_time <- model.matrix(~ design_full$Treatment + design_splines_full)

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels

>I suspect this is something to do with my original matrix, and in particular with how the reference time point T0 is represented in the matrix.

I am more of a wet lab person and my experience with R is a little less than moderate. I apologise in advance for the naive question, but any help here would be greatly appreciated.

Kind regards,

Iván

limma time series help • 617 views
0
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 5 hours ago
The city by the bay

It does not make sense to use Treatment in your design matrix. You only have one unique value (i.e., one level) in Treatment; there's no point including it in the design matrix, because there's nothing to compare it to. Rather, it makes more sense to put Batch in your design matrix as an additive factor. Presumably this refers to experimental batches or runs of the time course, and you should block on this to avoid unwanted variation.

Keep in mind that, if you're using splines, you should always be testing all of the spline coefficients together in topTable, e.g., coef=2:4 assuming the second to fourth coefficients are the spline terms. All spline coefficients are used to construct the fitted trend, so by setting all of them to zero with coef=, the null hypothesis is that there is no trend. (Note that testing individual spline coefficients does not make sense in the vast majority of cases.) This is another way of formulating your "any difference from zero time" question; if there were a trend, then at least one other time point would have different expression from that at zero time.

I should also point out that another valid formulation for your design matrix would be to convert Time into a factor and use ~Time + Batch. You can then do an ANOVA to test for any difference between the time points. This avoids the assumption of a smooth change in expression over time (which is necessary for the spline to work), at the cost of losing some residual d.f. and thus power.

0
Entering edit mode

Hello Aaron,

Thank you for your detailed answer. Indeed, the batches are three different biological replicates of the same experiment. I will be working on your suggestions.

Thank you again