Question: Limma : design and contrast for a timecourse experiment with 2 cell lines
0
3.8 years ago by
France
Pierre-François Roux0 wrote:

Dear BioC list,

I am currently working on a microarray timecourse experiment described bellow :

• Six time points,
• Two cell types A and B,
• No control experiment.

Basically, I am studying the effect through time of a treatment on gene expressions in different cell lines. The biological questions I have are :

• Which are the genes that are variable through time in one cell type but not the other ?
• Which are the genes that are variable through time and whose expression is concordant among cell types ?

Following the advices given in the Limma vignette (p.47 and 48), as well as explanations given in several BioC threads (question about design for limma time course, 2 conditions and drug treatment microarray experimentdesign and contrast matrix for limma time series without replicateshttps://support.bioconductor.org/p/58249/) I am trying to use Limma to analyze this data set, and to include splines when building my design matrix.

Here is what I've done so far :

> Design <- data.frame(Cell = c(rep("A", 6), rep("B", 6)),
Time = rep(c(0,24,48,96,144,192), 2))

> print(Design)
Cell Time
1     A    0
2     A   24
3     A   48
4     A   96
5     A  144
6     A  192
7     B    0
8     B   24
9     B   48
10    B   96
11    B  144
12    B  192

> Splines <- ns(Design$Time, df = 3) > Design_LIMMA_Cell <- model.matrix(~ Splines * Design$Cell)
> print(Design_LIMMA_Cell)
(Intercept)    Splines1  Splines2   Splines3 Design$CellB Splines1:Design$CellB Splines2:Design$CellB Splines3:Design$CellB
1            1  0.00000000 0.0000000  0.0000000            0            0.00000000             0.0000000             0.0000000
2            1 -0.10236787 0.3434740 -0.2250347            0            0.00000000             0.0000000             0.0000000
3            1 -0.05914652 0.5404054 -0.3538571            0            0.00000000             0.0000000             0.0000000
4            1  0.40630077 0.4406083 -0.2195072            0            0.00000000             0.0000000             0.0000000
5            1  0.41932830 0.3328953  0.2004080            0            0.00000000             0.0000000             0.0000000
6            1 -0.14592934 0.4231951  0.7227343            0            0.00000000             0.0000000             0.0000000
7            1  0.00000000 0.0000000  0.0000000            1            0.00000000             0.0000000             0.0000000
8            1 -0.10236787 0.3434740 -0.2250347            1           -0.10236787             0.3434740            -0.2250347
9            1 -0.05914652 0.5404054 -0.3538571            1           -0.05914652             0.5404054            -0.3538571
10           1  0.40630077 0.4406083 -0.2195072            1            0.40630077             0.4406083            -0.2195072
11           1  0.41932830 0.3328953  0.2004080            1            0.41932830             0.3328953             0.2004080
12           1 -0.14592934 0.4231951  0.7227343            1           -0.14592934             0.4231951             0.7227343
attr(,"assign")
[1] 0 1 1 1 2 3 3 3
attr(,"contrasts")
attr(,"contrasts")$Design$Cell
[1] "contr.treatment"

> Fit_LIMMA_Cell <- lmFit(Data_RMA_Clean, Design_LIMMA_Cell)
> Fit_LIMMA_Cell <- eBayes(Fit_LIMMA_Cell)

To detect genes with different trends between the two cell lines, I just have to conduct the F-test on the last 3 parameters corresponding to differences in the curves between the two cell lines. Am I right ?

> Sig_LIMMA_Time_Cell_Spe <- topTable(Fit2_LIMMA_Cell, coef = c(6:8) , adjust="BH", number = Inf)​

However, I am not sure how to proceed to answer my second biological question, i.e. highlighting variable genes whose change in expression is concordant between cell types.

Is my design matrix suited for this specific question ? Is using topTable(Fit2_LIMMA_Cell, coef = c(2:4) , adjust="BH", number = Inf)​ the way to go ?

Usually, people are interested in detecting changes through time comparing treated samples vs control samples. But for this specific experiments, I am also interested in identifying genes with the same behavior among cell types, with the idea to identify the "core" transcriptional network involved in the cellular answer to this specific treatment.

Any help is much appreciated !

Thanks a lot,

Cheers,

Pef

modified 3.8 years ago • written 3.8 years ago by Pierre-François Roux0
Answer: Limma : design and contrast for a timecourse experiment with 2 cell lines
3
3.8 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

For the first question; yes, your contrast is fine. Note that you'll only reject when the shape of the curve changes between cell types, but not when the relative position of the curve changes. For example, if you have a gene where one cell type has half the expression of the other, but the response to time is the same in both, then you won't reject the null with this contrast. That's okay, provided that's what you want.

Your second question is not answerable directly with topTable, as part of what you're looking for is non-DE between cell types. This is not simple to do in a rigorous manner, but we can often get something that's good enough for practical purposes. I would start off by identifying genes that exhibit any time effect:

any.time <- topTable(Fit2_LIMMA_Cell, coef = c(2:4,6:8), n=Inf, sort.by="none")

Then, I would look at the non-significant genes in the contrast from your first question:

cell.spec.time <- topTable(Fit2_LIMMA_Cell, coef = 6:8, number = Inf, sort.by="none")
​no.cell.spec <- cell.spec.time$P.Value > 0.2  I would usually use the confidence intervals to do this, but topTable won't report them for multiple coefficients and it's a bit tedious to have to do it manually. Identifying "non-DE" genes by filtering on large p-values is a bit sketchier, because you can get DE genes with large log-fold changes if they have large variances as well. Nonetheless, we can intersect these with those genes with a significant time effect: of.interest <- any.time$adj.P.Val <= 0.05 & no.cell.spec

That should give you the set of genes that exhibit a significant response to time that is similar in both cell types. If this is too conservative, maybe you should ask yourself whether you need genes that exhibit exactly the same quantitative response in both cell types. Sometimes, having the same qualitative response is good enough (i.e., significantly upregulated with time in both cell types).

Answer: Limma : design and contrast for a timecourse experiment with 2 cell lines
0
3.8 years ago by
France
Pierre-François Roux0 wrote:

Using your approach to highlight variable genes whose change in expression is concordant between cell types, even if probably too conservative, gave nice results. Do you think about an alternative approach to Limma to address this question in a more suited manner ?

Since I am thinking about adding more data in this experimental design (additional cell lines, and biological replicates for each cell line / time point that have been processed in different batches), at the end I will work with a more complicated design matrix that should look like this :

> Design <- data.frame(Cell = c(rep("A", 12), rep("B", 12), rep("C", 12)),
+                      Time = rep(c(0,24,48,96,144,192), 6),
+                      Batch = rep(seq(from = 1, to = 6), each = 6))
> Splines <- ns(Design$Time, df = 3) > Design Cell Time Batch 1 A 0 1 2 A 24 1 3 A 48 1 4 A 96 1 5 A 144 1 6 A 192 1 7 A 0 2 8 A 24 2 9 A 48 2 10 A 96 2 11 A 144 2 12 A 192 2 13 B 0 3 14 B 24 3 15 B 48 3 16 B 96 3 17 B 144 3 18 B 192 3 19 B 0 4 20 B 24 4 21 B 48 4 22 B 96 4 23 B 144 4 24 B 192 4 25 C 0 5 26 C 24 5 27 C 48 5 28 C 96 5 29 C 144 5 30 C 192 5 31 C 0 6 32 C 24 6 33 C 48 6 34 C 96 6 35 C 144 6 36 C 192 6  I am trying to define the model matrix including the aditionnal cell type and the correction for the batch effect to answer the same specific biological question, i.e. : • Which are the genes that are variable through time in one cell type but not the other ? • Which are the genes that are variable through time and whose expression is concordant among cell types ? Here is how I am defining the new model matrix : > Design_LIMMA_Cell <- model.matrix(~ Design$Batch + Splines * Design$Cell) > Design_LIMMA_Cell (Intercept) Design$Batch    Splines1  Splines2   Splines3 Design$CellB Design$CellC Splines1:Design$CellB 1 1 1 0.00000000 0.0000000 0.0000000 0 0 0.00000000 2 1 1 -0.10236787 0.3434740 -0.2250347 0 0 0.00000000 3 1 1 -0.05914652 0.5404054 -0.3538571 0 0 0.00000000 4 1 1 0.40630077 0.4406083 -0.2195072 0 0 0.00000000 5 1 1 0.41932830 0.3328953 0.2004080 0 0 0.00000000 6 1 1 -0.14592934 0.4231951 0.7227343 0 0 0.00000000 7 1 2 0.00000000 0.0000000 0.0000000 0 0 0.00000000 8 1 2 -0.10236787 0.3434740 -0.2250347 0 0 0.00000000 9 1 2 -0.05914652 0.5404054 -0.3538571 0 0 0.00000000 10 1 2 0.40630077 0.4406083 -0.2195072 0 0 0.00000000 11 1 2 0.41932830 0.3328953 0.2004080 0 0 0.00000000 12 1 2 -0.14592934 0.4231951 0.7227343 0 0 0.00000000 13 1 3 0.00000000 0.0000000 0.0000000 1 0 0.00000000 14 1 3 -0.10236787 0.3434740 -0.2250347 1 0 -0.10236787 15 1 3 -0.05914652 0.5404054 -0.3538571 1 0 -0.05914652 16 1 3 0.40630077 0.4406083 -0.2195072 1 0 0.40630077 17 1 3 0.41932830 0.3328953 0.2004080 1 0 0.41932830 18 1 3 -0.14592934 0.4231951 0.7227343 1 0 -0.14592934 19 1 4 0.00000000 0.0000000 0.0000000 1 0 0.00000000 20 1 4 -0.10236787 0.3434740 -0.2250347 1 0 -0.10236787 21 1 4 -0.05914652 0.5404054 -0.3538571 1 0 -0.05914652 22 1 4 0.40630077 0.4406083 -0.2195072 1 0 0.40630077 23 1 4 0.41932830 0.3328953 0.2004080 1 0 0.41932830 24 1 4 -0.14592934 0.4231951 0.7227343 1 0 -0.14592934 25 1 5 0.00000000 0.0000000 0.0000000 0 1 0.00000000 26 1 5 -0.10236787 0.3434740 -0.2250347 0 1 0.00000000 27 1 5 -0.05914652 0.5404054 -0.3538571 0 1 0.00000000 28 1 5 0.40630077 0.4406083 -0.2195072 0 1 0.00000000 29 1 5 0.41932830 0.3328953 0.2004080 0 1 0.00000000 30 1 5 -0.14592934 0.4231951 0.7227343 0 1 0.00000000 31 1 6 0.00000000 0.0000000 0.0000000 0 1 0.00000000 32 1 6 -0.10236787 0.3434740 -0.2250347 0 1 0.00000000 33 1 6 -0.05914652 0.5404054 -0.3538571 0 1 0.00000000 34 1 6 0.40630077 0.4406083 -0.2195072 0 1 0.00000000 35 1 6 0.41932830 0.3328953 0.2004080 0 1 0.00000000 36 1 6 -0.14592934 0.4231951 0.7227343 0 1 0.00000000 Splines2:Design$CellB Splines3:Design$CellB Splines1:Design$CellC Splines2:Design$CellC Splines3:Design$CellC
1              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
2              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
3              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
4              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
5              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
6              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
7              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
8              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
9              0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
10             0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
11             0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
12             0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
13             0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
14             0.3434740            -0.2250347            0.00000000             0.0000000             0.0000000
15             0.5404054            -0.3538571            0.00000000             0.0000000             0.0000000
16             0.4406083            -0.2195072            0.00000000             0.0000000             0.0000000
17             0.3328953             0.2004080            0.00000000             0.0000000             0.0000000
18             0.4231951             0.7227343            0.00000000             0.0000000             0.0000000
19             0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
20             0.3434740            -0.2250347            0.00000000             0.0000000             0.0000000
21             0.5404054            -0.3538571            0.00000000             0.0000000             0.0000000
22             0.4406083            -0.2195072            0.00000000             0.0000000             0.0000000
23             0.3328953             0.2004080            0.00000000             0.0000000             0.0000000
24             0.4231951             0.7227343            0.00000000             0.0000000             0.0000000
25             0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
26             0.0000000             0.0000000           -0.10236787             0.3434740            -0.2250347
27             0.0000000             0.0000000           -0.05914652             0.5404054            -0.3538571
28             0.0000000             0.0000000            0.40630077             0.4406083            -0.2195072
29             0.0000000             0.0000000            0.41932830             0.3328953             0.2004080
30             0.0000000             0.0000000           -0.14592934             0.4231951             0.7227343
31             0.0000000             0.0000000            0.00000000             0.0000000             0.0000000
32             0.0000000             0.0000000           -0.10236787             0.3434740            -0.2250347
33             0.0000000             0.0000000           -0.05914652             0.5404054            -0.3538571
34             0.0000000             0.0000000            0.40630077             0.4406083            -0.2195072
35             0.0000000             0.0000000            0.41932830             0.3328953             0.2004080
36             0.0000000             0.0000000           -0.14592934             0.4231951             0.7227343
attr(,"assign")
[1] 0 1 2 2 2 3 3 4 4 4 4 4 4
attr(,"contrasts")
attr(,"contrasts")$Design$Cell
[1] "contr.treatment"

In this case, does that make sens to to conduct the F-test on the last 6 parameters to detect genes with different trends between any pairs of cell lines with Sig_LIMMA_All_TT <- topTable(Fit2_LIMMA, coef=c(8:13), adjust="BH", number = Inf) ?

Or do I need to conduct the test in a pairwise way using  ? :

Sig_LIMMA_A_B <- topTable(Fit2_LIMMA, coef=c(8:10, ), adjust="BH", number = Inf)
Sig_LIMMA_A_C <- topTable(Fit2_LIMMA, coef=c(11:13), adjust="BH", number = Inf)

And finally, how to look for differences between B and C in this way ? I assume I have to manually define the contrast using makeContrasts() but I don't really understand how to properly use it.

I am not sure how to adapt the approach you've described in your previous post to this more complicated design matrix.

Cheers,

Pef

2

You need to make batch a factor. When you do so, the batch becomes fully nested within the cell type, so there's no point including the latter in the design matrix. Instead, you can do:

Batch <- factor(Design$Batch) Design_LIMMA_Cell <- model.matrix(~0 + Batch + Splines:Design$Cell)

I use an intercept-free model because it's easier to explain. In particular, if you look at the coefficients, you get:

 [1] "Batch1"                "Batch2"                "Batch3"
[4] "Batch4"                "Batch5"                "Batch6"
[7] "Splines1:Design$CellA" "Splines2:Design$CellA" "Splines3:Design$CellA" [10] "Splines1:Design$CellB" "Splines2:Design$CellB" "Splines3:Design$CellB"
[13] "Splines1:Design$CellC" "Splines2:Design$CellC" "Splines3:Design$CellC" The first 6 coefficients are the batch-specific intercepts of the spline, and can be ignored. This is because you're only interested in the response to time within each cell type, which are represented by the remaining cell type-specific spline coefficients. You can then deal with these like you did before with the previous design, i.e., containing just A and B. Edit: A bit of clarification; to do the comparisons between A and B here, you'll have to do a bit more work: # Substitute ':' and '$' with '.' in colnames(design)
con <- makeContrasts(Splines1.Design.CellA - Splines1.Design.CellB,
Splines2.Design.CellA - Splines2.Design.CellB,
Splines3.Design.CellA - Splines3.Design.CellB, levels=Design_LIMMA_Cell)

This is because the parametrization of the spline coefficients is different in an intercept-free model. Here, each set of spline coefficients represents that of the spline for each cell type, not the difference in splines between cell types.

Thanks a lot Aaron. This is far more clearer.

Using this new model matrix, if I want to highlight genes exhibiting any time effect, I should use : any.time <- topTable(Fit2_LIMMA_Cell, coef = c(7:15), n=Inf, sort.by="none"). Am I right ?

Thx !

1

Yep, that's right. That said, going back to the original question, if you want to identify genes with a time effect that is the same in A and B, then you don't necessarily care what happens in C; in such a case, you should only test for "any time effect" in A and B by themselves. This would be equivalent to coef=7:12.