Question: LIMMA on matched time series data
0
3.9 years ago by
India

Hi,

I have timecourse microarray data generated from cell culture experiment. Three replicates were taken from the culture (three biological replicates - R1, R2 and R3), and, further on each replicate three treatments were given (T1, T2 and T3). For each treatment, cells were harvested at 12hrs, 24hrs, 48hrs and 72 hrs (H1, H2, H3 and H4, respectively). For all the replicates there was a common starting point of 0hrs without any treatment.

FileName    Subject    Group    Time
Ctrl_0h_Rep1    R1    Ctrl    H0
Ctrl_0h_Rep2    R2    Ctrl    H0
Ctrl_0h_Rep3    R3    Ctrl    H0
T1_12h_Rep1    R1    T1    H1
T1_12h_Rep2    R2    T1    H1
T1_12h_Rep3    R3    T1    H1
T1_24h_Rep1    R1    T1    H2
T1_24h_Rep2    R2    T1    H2
T1_24h_Rep3    R3    T1    H2
T1_48h_Rep1    R1    T1    H3
T1_48h_Rep2    R2    T1    H3
T1_48h_Rep3    R3    T1    H4
-    -    -    -    -
-    -    -    -    -
-    -    -    -    -
T3_72h_Rep1    R1    T3    H5
T3_72h_Rep2    R2    T3    H5
T3_72h_Rep3    R3    T3    H5

Objective of the experiment is to identify genes that are specifically differentially regulated on each treatment.

As treatment samples are related within each replicate (can I take them as independent??), I am using duplicateCorrelation() function in limma. Here is the code I am using-

Time <- as.numeric(factor(targets$Time)) Group <- factor(targets$Group)
X <- ns(Time,df=3)
design <- model.matrix(~0+Group+Group:X)

corfit <- duplicateCorrelation(dt,design,block=targets$Subject) corfit$consensus

colnames(design) <- make.names(colnames(design))
fit <- lmFit(dt,design, block=targets$Subject,correlation=corfit$consensus)
fit <- eBayes(fit)

### T2 vs T1

cont.mat <- makeContrasts(GroupT2.X1-GroupT1.X1, GroupT2.X2-GroupT1.X2, GroupT2.X3-GroupT1.X3, levels=design)
fit2 <- contrasts.fit(fit,cont.mat)
fit2 <- eBayes(fit2)
topTable(fit2)

#### T3 vs T2

cont.mat <- makeContrasts(GroupT3.X1-GroupT2.X1, GroupT3.X2-GroupT2.X2, GroupT3.X3-GroupT2.X3,levels=design)
fit2 <- contrasts.fit(fit,cont.mat)
fit2 <- eBayes(fit2)
topTable(fit2)

#### T3 vs T1

cont.mat <- makeContrasts(GroupT3.X1-GroupT1.X1, GroupT3.X2-GroupT1.X2, GroupT3.X3-GroupT1.X3,levels=design)
fit2 <- contrasts.fit(fit,cont.mat)
fit2 <- eBayes(fit2)
topTable(fit2)

I am planning to use Venn diagrams to get Treatment specific genes.

When I run this code, I am facing problems with Ctrl samples, as expected. Is there any other way to add these samples ? or should I remove them from the analysis ?

- In spite of using the splines, can I in any way extract genes that have minimum 2 fold change in at least one time point?

- Why am I getting very less number of genes with adj p-value < 0.1? If I decrease the df of spline curve to 2, numbers are decreasing further.

- Is limma the better way to do this kind of analysis or should I shift to any other package?

Thanks,

Sandhya

timecourse limma • 1.7k views
modified 3.9 years ago by Aaron Lun24k • written 3.9 years ago by Sandhya Pemmasani Kiran40

I presume T1_48h_Rep3 is meant to have time H3, and T3_72h_Rep1 is meant to have time H4.

Answer: LIMMA on matched time series data
2
3.9 years ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

If I understand your data set correctly, your control samples represent time zero for each treatment effect. Your current design will not capture this, as the control samples have their own coefficient; any changes in expression in the control samples will have no effect on the spline fit. Instead, you could do something like this:

> design <- model.matrix(~Group:X)
> design <- design[,colSums(abs(design))>0]
> colnames(design)
[1] "(Intercept)" "GroupT1:X1"  "GroupT2:X1"  "GroupT3:X1"  "GroupT1:X2"
[6] "GroupT2:X2"  "GroupT3:X2"  "GroupT1:X3"  "GroupT2:X3"  "GroupT3:X3"


In this manner, the set of samples for each treatment still gets fitted to its own spline, but all treatments are using the control samples as the zero time-point. You can then do your comparisons that you've listed above.

As for your second question; the absolute values of the spline coefficients aren't easy to interpret. If you want to enforce some minimum fold change, I would normally suggest using treat, but (at least, for me) there's no intuitive relationship between the value of the coefficients and the fold change at a given time point. Probably the simpler approach is to fit a one-way model with each time point as a separate group, and then use treat with the specified fold-change threshold to compare treatments at each time point.

For your third question; the number of DE genes depends on a number of things, such as the variability of your replicates, the appropriateness of your design and the true amount of DE in the data. You should check MDS plots to see that your replicates are clustered together, and that samples are well-separated by the treatment or time effects. If it looks okay there, then the next question is whether the design is appropriate. One possibility is that the spline is not fitting well, which results in larger variances and loss of power (which is exacerbated when you reduce the spline d.f., as the spline fit will deteriorate). If this is the case, you could try increasing the spline d.f., or try using the one-way model that I described above. Both should result in a better fit at the cost of some residual d.f.

Finally, limma is entirely appropriate for this type of analysis. I'd struggle to think of alternatives if I couldn't do it with limma.

Hi Aaron,

Firstly, sorry for the mistake in design matrix. Here is the corrected one.

FileName    Subject    Group    Time
Ctrl_0h_Rep1    R1    Ctrl    0
Ctrl_0h_Rep2    R2    Ctrl    0
Ctrl_0h_Rep3    R3    Ctrl    0
T1_12h_Rep1    R1    T1    12
T1_12h_Rep2    R2    T1    12
T1_12h_Rep3    R3    T1    12
T1_24h_Rep1    R1    T1    24
T1_24h_Rep2    R2    T1    24
T1_24h_Rep3    R3    T1    24
T1_48h_Rep1    R1    T1    48
T1_48h_Rep2    R2    T1    48
T1_48h_Rep3    R3    T1    48
T1_72h_Rep2    R2    T1    72
T1_72h_Rep3    R3    T1    72
T2_12h_Rep1    R1    T2    12
T2_12h_Rep2    R2    T2    12
T2_12h_Rep3    R3    T2    12
T2_24h_Rep1    R1    T2    24
T2_24h_Rep2    R2    T2    24
T2_24h_Rep3    R3    T2    24
T2_48h_Rep1    R1    T2    48
T2_48h_Rep2    R2    T2    48
T2_48h_Rep3    R3    T2    48
T2_72h_Rep2    R2    T2    72
T2_72h_Rep3    R3    T2    72
T3_12h_Rep1    R1    T3    12
T3_12h_Rep2    R2    T3    12
T3_12h_Rep3    R3    T3    12
T3_24h_Rep1    R1    T3    24
T3_24h_Rep2    R2    T3    24
T3_24h_Rep3    R3    T3    24
T3_48h_Rep1    R1    T3    48
T3_48h_Rep2    R2    T3    48
T3_48h_Rep3    R3    T3    48
T3_72h_Rep1    R1    T3    72
T3_72h_Rep2    R2    T3    72
T3_72h_Rep3    R3    T3    72

Thank you for the suggestions, I have incorporated the changes in the design matrix. So, we have removed group factor form the model and kept only Interactions.

I have few more queries-

After increasing the spline d.f. to 4 (from 3), I got 20 DE genes (16 with 3 d.f.) for T2 vs T1. There are considerable number of genes for other comparisons and the genes are relevant to our research objective. MDS plot also looks fine with samples clustered based on Time point. May be there is less variability in T2 and T1.

As you suggested, I have built one-way model as well, to check T2 vs T1.

TS <- paste(targets$Group, targets$Time, sep=".")
TS <- factor(TS, levels=unique(TS))
design <- model.matrix(~0+TS)
colnames(design) <- levels(TS)

fit <- lmFit(dt, design)

cont.dif <- makeContrasts(
Dif12hr =T2.12-T1.12,
Dif24hr =(T2.24-T2.12)-(T1.24-T1.12),
Dif48hr=(T2.48-T2.24)-(T1.48-T1.24),
Dif72hr=(T2.72-T2.48)-(T1.72-T1.48),
levels=design)
fit2 <- contrasts.fit(fit, cont.dif)
fit2 <- eBayes(fit2)
topTableF(fit2)

This way also I got only 8 genes with FDR 0.1. Do I need to add any extra contrasts here? Am I covering all the required contrasts to check Treatment differences w.r.t time course profiles?

Apart from 0hr observations, I have only 4 time points. Should I use splines for such smaller number of time points?

I understand that it is difficult to interpret spline coefficients. Can I use them for any other purpose, like, plotting?

Thanks,

Sandhya

• Your one-way model looks fine, if you're testing for differences in the time effect between treatments. But don't forget to use duplicateCorrelation for the Subject blocking factor.
• I've never used spline coefficients for plotting, but I guess you could do so. You could use predict to get the spline basis for a vector of covariate values (i.e., time points), and then matrix-multiply the spline basis by the estimated spline coefficients to get the fitted values of the spline.