Question

Limma : design and contrast for a timecourse experiment with 2 cell lines

0

Entering edit mode

Pierre-François Roux • 0

@pierre-francois-roux-7997

Last seen 7.5 years ago

France

Dear BioC list,

I am currently working on a microarray timecourse experiment described bellow :

Six time points,
Two cell types A and B,
No control experiment.

Basically, I am studying the effect through time of a treatment on gene expressions in different cell lines. The biological questions I have are :

Which are the genes that are variable through time in one cell type but not the other ?

Which are the genes that are variable through time and whose expression is concordant among cell types ?

Following the advices given in the Limma vignette (p.47 and 48), as well as explanations given in several BioC threads (question about design for limma time course, 2 conditions and drug treatment microarray experiment, design and contrast matrix for limma time series without replicates, https://support.bioconductor.org/p/58249/) I am trying to use Limma to analyze this data set, and to include splines when building my design matrix.

Here is what I've done so far :

> Design <- data.frame(Cell = c(rep("A", 6), rep("B", 6)),
                     Time = rep(c(0,24,48,96,144,192), 2))

> print(Design)
   Cell Time
1     A    0
2     A   24
3     A   48
4     A   96
5     A  144
6     A  192
7     B    0
8     B   24
9     B   48
10    B   96
11    B  144
12    B  192

> Splines <- ns(Design$Time, df = 3)
> Design_LIMMA_Cell <- model.matrix(~ Splines * Design$Cell)
> print(Design_LIMMA_Cell)
   (Intercept)    Splines1  Splines2   Splines3 Design$CellB Splines1:Design$CellB Splines2:Design$CellB Splines3:Design$CellB
1            1  0.00000000 0.0000000  0.0000000            0            0.00000000             0.0000000             0.0000000
2            1 -0.10236787 0.3434740 -0.2250347            0            0.00000000             0.0000000             0.0000000
3            1 -0.05914652 0.5404054 -0.3538571            0            0.00000000             0.0000000             0.0000000
4            1  0.40630077 0.4406083 -0.2195072            0            0.00000000             0.0000000             0.0000000
5            1  0.41932830 0.3328953  0.2004080            0            0.00000000             0.0000000             0.0000000
6            1 -0.14592934 0.4231951  0.7227343            0            0.00000000             0.0000000             0.0000000
7            1  0.00000000 0.0000000  0.0000000            1            0.00000000             0.0000000             0.0000000
8            1 -0.10236787 0.3434740 -0.2250347            1           -0.10236787             0.3434740            -0.2250347
9            1 -0.05914652 0.5404054 -0.3538571            1           -0.05914652             0.5404054            -0.3538571
10           1  0.40630077 0.4406083 -0.2195072            1            0.40630077             0.4406083            -0.2195072
11           1  0.41932830 0.3328953  0.2004080            1            0.41932830             0.3328953             0.2004080
12           1 -0.14592934 0.4231951  0.7227343            1           -0.14592934             0.4231951             0.7227343
attr(,"assign")
[1] 0 1 1 1 2 3 3 3
attr(,"contrasts")
attr(,"contrasts")$`Design$Cell`
[1] "contr.treatment"

> Fit_LIMMA_Cell <- lmFit(Data_RMA_Clean, Design_LIMMA_Cell)
> Fit_LIMMA_Cell <- eBayes(Fit_LIMMA_Cell)

To detect genes with different trends between the two cell lines, I just have to conduct the F-test on the last 3 parameters corresponding to differences in the curves between the two cell lines. Am I right ?

> Sig_LIMMA_Time_Cell_Spe <- topTable(Fit2_LIMMA_Cell, coef = c(6:8) , adjust="BH", number = Inf)

However, I am not sure how to proceed to answer my second biological question, i.e. highlighting variable genes whose change in expression is concordant between cell types.

Is my design matrix suited for this specific question ? Is using topTable(Fit2_LIMMA_Cell, coef = c(2:4) , adjust="BH", number = Inf) the way to go ?

Usually, people are interested in detecting changes through time comparing treated samples vs control samples. But for this specific experiments, I am also interested in identifying genes with the same behavior among cell types, with the idea to identify the "core" transcriptional network involved in the cellular answer to this specific treatment.

Any help is much appreciated !

Thanks a lot,

Cheers,

Pef

limma limma design matrix timecourse • 2.6k views

ADD COMMENT • link 10.0 years ago Pierre-François Roux • 0

0

Entering edit mode

Pierre-François Roux • 0

@pierre-francois-roux-7997

Last seen 7.5 years ago

France

Thank you so much Aaron for your precious advices.

Using your approach to highlight variable genes whose change in expression is concordant between cell types, even if probably too conservative, gave nice results. Do you think about an alternative approach to Limma to address this question in a more suited manner ?

Since I am thinking about adding more data in this experimental design (additional cell lines, and biological replicates for each cell line / time point that have been processed in different batches), at the end I will work with a more complicated design matrix that should look like this :

> Design <- data.frame(Cell = c(rep("A", 12), rep("B", 12), rep("C", 12)),
+                      Time = rep(c(0,24,48,96,144,192), 6),
+                      Batch = rep(seq(from = 1, to = 6), each = 6))
> Splines <- ns(Design$Time, df = 3)
> Design
   Cell Time Batch
1     A    0     1
2     A   24     1
3     A   48     1
4     A   96     1
5     A  144     1
6     A  192     1
7     A    0     2
8     A   24     2
9     A   48     2
10    A   96     2
11    A  144     2
12    A  192     2
13    B    0     3
14    B   24     3
15    B   48     3
16    B   96     3
17    B  144     3
18    B  192     3
19    B    0     4
20    B   24     4
21    B   48     4
22    B   96     4
23    B  144     4
24    B  192     4
25    C    0     5
26    C   24     5
27    C   48     5
28    C   96     5
29    C  144     5
30    C  192     5
31    C    0     6
32    C   24     6
33    C   48     6
34    C   96     6
35    C  144     6
36    C  192     6

I am trying to define the model matrix including the aditionnal cell type and the correction for the batch effect to answer the same specific biological question, i.e. :

Which are the genes that are variable through time in one cell type but not the other ?

Which are the genes that are variable through time and whose expression is concordant among cell types ?

Here is how I am defining the new model matrix :

> Design_LIMMA_Cell <- model.matrix(~ Design$Batch + Splines * Design$Cell)
> Design_LIMMA_Cell
   (Intercept) Design$Batch    Splines1  Splines2   Splines3 Design$CellB Design$CellC Splines1:Design$CellB
1            1            1  0.00000000 0.0000000  0.0000000            0            0            0.00000000
2            1            1 -0.10236787 0.3434740 -0.2250347            0            0            0.00000000
3            1            1 -0.05914652 0.5404054 -0.3538571            0            0            0.00000000
4            1            1  0.40630077 0.4406083 -0.2195072            0            0            0.00000000
5            1            1  0.41932830 0.3328953  0.2004080            0            0            0.00000000
6            1            1 -0.14592934 0.4231951  0.7227343            0            0            0.00000000
7            1            2  0.00000000 0.0000000  0.0000000            0            0            0.00000000
8            1            2 -0.10236787 0.3434740 -0.2250347            0            0            0.00000000
9            1            2 -0.05914652 0.5404054 -0.3538571            0            0            0.00000000
10           1            2  0.40630077 0.4406083 -0.2195072            0            0            0.00000000
11           1            2  0.41932830 0.3328953  0.2004080            0            0            0.00000000
12           1            2 -0.14592934 0.4231951  0.7227343            0            0            0.00000000
13           1            3  0.00000000 0.0000000  0.0000000            1            0            0.00000000
14           1            3 -0.10236787 0.3434740 -0.2250347            1            0           -0.10236787
15           1            3 -0.05914652 0.5404054 -0.3538571            1            0           -0.05914652
16           1            3  0.40630077 0.4406083 -0.2195072            1            0            0.40630077
17           1            3  0.41932830 0.3328953  0.2004080            1            0            0.41932830
18           1            3 -0.14592934 0.4231951  0.7227343            1            0           -0.14592934
19           1            4  0.00000000 0.0000000  0.0000000            1            0            0.00000000
20           1            4 -0.10236787 0.3434740 -0.2250347            1            0           -0.10236787
21           1            4 -0.05914652 0.5404054 -0.3538571            1            0           -0.05914652
22           1            4  0.40630077 0.4406083 -0.2195072            1            0            0.40630077
23           1            4  0.41932830 0.3328953  0.2004080            1            0            0.41932830
24           1            4 -0.14592934 0.4231951  0.7227343            1            0           -0.14592934
25           1            5  0.00000000 0.0000000  0.0000000            0            1            0.00000000
26           1            5 -0.10236787 0.3434740 -0.2250347            0            1            0.00000000
27           1            5 -0.05914652 0.5404054 -0.3538571            0            1            0.00000000
28           1            5  0.40630077 0.4406083 -0.2195072            0            1            0.00000000
29           1            5  0.41932830 0.3328953  0.2004080            0            1            0.00000000
30           1            5 -0.14592934 0.4231951  0.7227343            0            1            0.00000000
31           1            6  0.00000000 0.0000000  0.0000000            0            1            0.00000000
32           1            6 -0.10236787 0.3434740 -0.2250347            0            1            0.00000000
33           1            6 -0.05914652 0.5404054 -0.3538571            0            1            0.00000000
34           1            6  0.40630077 0.4406083 -0.2195072            0            1            0.00000000
35           1            6  0.41932830 0.3328953  0.2004080            0            1            0.00000000
36           1            6 -0.14592934 0.4231951  0.7227343            0            1            0.00000000
   Splines2:Design$CellB Splines3:Design$CellB Splines1:Design$CellC Splines2:Design$CellC Splines3:Design$CellC
1              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
2              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
3              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
4              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
5              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
6              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
7              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
8              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
9              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
10             0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
11             0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
12             0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
13             0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
14             0.3434740            -0.2250347            0.00000000             0.0000000             0.0000000
15             0.5404054            -0.3538571            0.00000000             0.0000000             0.0000000
16             0.4406083            -0.2195072            0.00000000             0.0000000             0.0000000
17             0.3328953             0.2004080            0.00000000             0.0000000             0.0000000
18             0.4231951             0.7227343            0.00000000             0.0000000             0.0000000
19             0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
20             0.3434740            -0.2250347            0.00000000             0.0000000             0.0000000
21             0.5404054            -0.3538571            0.00000000             0.0000000             0.0000000
22             0.4406083            -0.2195072            0.00000000             0.0000000             0.0000000
23             0.3328953             0.2004080            0.00000000             0.0000000             0.0000000
24             0.4231951             0.7227343            0.00000000             0.0000000             0.0000000
25             0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
26             0.0000000             0.0000000           -0.10236787             0.3434740            -0.2250347
27             0.0000000             0.0000000           -0.05914652             0.5404054            -0.3538571
28             0.0000000             0.0000000            0.40630077             0.4406083            -0.2195072
29             0.0000000             0.0000000            0.41932830             0.3328953             0.2004080
30             0.0000000             0.0000000           -0.14592934             0.4231951             0.7227343
31             0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
32             0.0000000             0.0000000           -0.10236787             0.3434740            -0.2250347
33             0.0000000             0.0000000           -0.05914652             0.5404054            -0.3538571
34             0.0000000             0.0000000            0.40630077             0.4406083            -0.2195072
35             0.0000000             0.0000000            0.41932830             0.3328953             0.2004080
36             0.0000000             0.0000000           -0.14592934             0.4231951             0.7227343
attr(,"assign")
 [1] 0 1 2 2 2 3 3 4 4 4 4 4 4
attr(,"contrasts")
attr(,"contrasts")$`Design$Cell`
[1] "contr.treatment"

In this case, does that make sens to to conduct the F-test on the last 6 parameters to detect genes with different trends between any pairs of cell lines with Sig_LIMMA_All_TT <- topTable(Fit2_LIMMA, coef=c(8:13), adjust="BH", number = Inf) ?

Or do I need to conduct the test in a pairwise way using ? :

Sig_LIMMA_A_B <- topTable(Fit2_LIMMA, coef=c(8:10, ), adjust="BH", number = Inf)
Sig_LIMMA_A_C <- topTable(Fit2_LIMMA, coef=c(11:13), adjust="BH", number = Inf)

And finally, how to look for differences between B and C in this way ? I assume I have to manually define the contrast using makeContrasts() but I don't really understand how to properly use it.

I am not sure how to adapt the approach you've described in your previous post to this more complicated design matrix.

Many thanks for your help.

Cheers,

Pef

ADD COMMENT • link 10.0 years ago Pierre-François Roux • 0

2

Entering edit mode

You need to make batch a factor. When you do so, the batch becomes fully nested within the cell type, so there's no point including the latter in the design matrix. Instead, you can do:

Batch <- factor(Design$Batch)
Design_LIMMA_Cell <- model.matrix(~0 + Batch + Splines:Design$Cell)

I use an intercept-free model because it's easier to explain. In particular, if you look at the coefficients, you get:

 [1] "Batch1"                "Batch2"                "Batch3"               
 [4] "Batch4"                "Batch5"                "Batch6"               
 [7] "Splines1:Design$CellA" "Splines2:Design$CellA" "Splines3:Design$CellA"
[10] "Splines1:Design$CellB" "Splines2:Design$CellB" "Splines3:Design$CellB"
[13] "Splines1:Design$CellC" "Splines2:Design$CellC" "Splines3:Design$CellC"

The first 6 coefficients are the batch-specific intercepts of the spline, and can be ignored. This is because you're only interested in the response to time within each cell type, which are represented by the remaining cell type-specific spline coefficients. You can then deal with these like you did before with the previous design, i.e., containing just A and B.

Edit: A bit of clarification; to do the comparisons between A and B here, you'll have to do a bit more work:

# Substitute ':' and '$' with '.' in colnames(design)
con <- makeContrasts(Splines1.Design.CellA - Splines1.Design.CellB,
          Splines2.Design.CellA - Splines2.Design.CellB,
          Splines3.Design.CellA - Splines3.Design.CellB, levels=Design_LIMMA_Cell)

This is because the parametrization of the spline coefficients is different in an intercept-free model. Here, each set of spline coefficients represents that of the spline for each cell type, not the difference in splines between cell types.

ADD REPLY • link 10.0 years ago Aaron Lun ★ 29k

0

Entering edit mode

Thanks a lot Aaron. This is far more clearer.

Using this new model matrix, if I want to highlight genes exhibiting any time effect, I should use : any.time <- topTable(Fit2_LIMMA_Cell, coef = c(7:15), n=Inf, sort.by="none"). Am I right ?

Thx !

ADD REPLY • link 10.0 years ago Pierre-François Roux • 0

1

Entering edit mode

Yep, that's right. That said, going back to the original question, if you want to identify genes with a time effect that is the same in A and B, then you don't necessarily care what happens in C; in such a case, you should only test for "any time effect" in A and B by themselves. This would be equivalent to coef=7:12.

ADD REPLY • link 10.0 years ago Aaron Lun ★ 29k

score 3 · Accepted Answer · 2015-12-21

For the first question; yes, your contrast is fine. Note that you'll only reject when the shape of the curve changes between cell types, but not when the relative position of the curve changes. For example, if you have a gene where one cell type has half the expression of the other, but the response to time is the same in both, then you won't reject the null with this contrast. That's okay, provided that's what you want.

Your second question is not answerable directly with topTable, as part of what you're looking for is non-DE between cell types. This is not simple to do in a rigorous manner, but we can often get something that's good enough for practical purposes. I would start off by identifying genes that exhibit any time effect:

any.time <- topTable(Fit2_LIMMA_Cell, coef = c(2:4,6:8), n=Inf, sort.by="none")

Then, I would look at the non-significant genes in the contrast from your first question:

cell.spec.time <- topTable(Fit2_LIMMA_Cell, coef = 6:8, number = Inf, sort.by="none")
no.cell.spec <- cell.spec.time$P.Value > 0.2

I would usually use the confidence intervals to do this, but topTable won't report them for multiple coefficients and it's a bit tedious to have to do it manually. Identifying "non-DE" genes by filtering on large p-values is a bit sketchier, because you can get DE genes with large log-fold changes if they have large variances as well. Nonetheless, we can intersect these with those genes with a significant time effect:

of.interest <- any.time$adj.P.Val <= 0.05 & no.cell.spec

That should give you the set of genes that exhibit a significant response to time that is similar in both cell types. If this is too conservative, maybe you should ask yourself whether you need genes that exhibit exactly the same quantitative response in both cell types. Sometimes, having the same qualitative response is good enough (i.e., significantly upregulated with time in both cell types).