Question

LIMMA on matched time series data

0

Entering edit mode

Sandhya Pemmasani Kiran ▴ 40

@sandhya-pemmasani-kiran-5878

Last seen 7.2 years ago

India

Hi,

I have timecourse microarray data generated from cell culture experiment. Three replicates were taken from the culture (three biological replicates - R1, R2 and R3), and, further on each replicate three treatments were given (T1, T2 and T3). For each treatment, cells were harvested at 12hrs, 24hrs, 48hrs and 72 hrs (H1, H2, H3 and H4, respectively). For all the replicates there was a common starting point of 0hrs without any treatment.

FileName   Subject   Group   Time
Ctrl_0h_Rep1   R1   Ctrl   H0
Ctrl_0h_Rep2   R2   Ctrl   H0
Ctrl_0h_Rep3   R3   Ctrl   H0
T1_12h_Rep1   R1   T1   H1
T1_12h_Rep2   R2   T1   H1
T1_12h_Rep3   R3   T1   H1
T1_24h_Rep1   R1   T1   H2
T1_24h_Rep2   R2   T1   H2
T1_24h_Rep3   R3   T1   H2
T1_48h_Rep1   R1   T1   H3
T1_48h_Rep2   R2   T1   H3
T1_48h_Rep3   R3   T1   H4
-   -   -   -   -
-   -   -   -   -
-   -   -   -   -
T3_72h_Rep1   R1   T3   H5
T3_72h_Rep2   R2   T3   H5
T3_72h_Rep3   R3   T3   H5

Objective of the experiment is to identify genes that are specifically differentially regulated on each treatment.

As treatment samples are related within each replicate (can I take them as independent??), I am using duplicateCorrelation() function in limma. Here is the code I am using-

Time <- as.numeric(factor(targets$Time))
Group <- factor(targets$Group)
X <- ns(Time,df=3)
design <- model.matrix(~0+Group+Group:X)

corfit <- duplicateCorrelation(dt,design,block=targets$Subject)
corfit$consensus

colnames(design) <- make.names(colnames(design))
fit <- lmFit(dt,design, block=targets$Subject,correlation=corfit$consensus)
fit <- eBayes(fit)

### T2 vs T1

cont.mat <- makeContrasts(GroupT2.X1-GroupT1.X1, GroupT2.X2-GroupT1.X2, GroupT2.X3-GroupT1.X3, levels=design)
fit2 <- contrasts.fit(fit,cont.mat)
fit2 <- eBayes(fit2)
topTable(fit2)

#### T3 vs T2

cont.mat <- makeContrasts(GroupT3.X1-GroupT2.X1, GroupT3.X2-GroupT2.X2, GroupT3.X3-GroupT2.X3,levels=design)
fit2 <- contrasts.fit(fit,cont.mat)
fit2 <- eBayes(fit2)
topTable(fit2)

#### T3 vs T1

cont.mat <- makeContrasts(GroupT3.X1-GroupT1.X1, GroupT3.X2-GroupT1.X2, GroupT3.X3-GroupT1.X3,levels=design)
fit2 <- contrasts.fit(fit,cont.mat)
fit2 <- eBayes(fit2)
topTable(fit2)

I am planning to use Venn diagrams to get Treatment specific genes.

When I run this code, I am facing problems with Ctrl samples, as expected. Is there any other way to add these samples ? or should I remove them from the analysis ?

- In spite of using the splines, can I in any way extract genes that have minimum 2 fold change in at least one time point?

- Why am I getting very less number of genes with adj p-value < 0.1? If I decrease the df of spline curve to 2, numbers are decreasing further.

- Is limma the better way to do this kind of analysis or should I shift to any other package?

Thanks,

Sandhya

timecourse limma • 5.3k views

ADD COMMENT • link updated 10.5 years ago by Aaron Lun ★ 29k • written 10.5 years ago by Sandhya Pemmasani Kiran ▴ 40

0

Entering edit mode

I presume T1_48h_Rep3 is meant to have time H3, and T3_72h_Rep1 is meant to have time H4.

ADD REPLY • link 10.5 years ago Aaron Lun ★ 29k

score 2 · Answer 1 · 2015-07-17

2

Entering edit mode

Aaron Lun ★ 29k

@alun

Last seen 5 hours ago

The city by the bay

If I understand your data set correctly, your control samples represent time zero for each treatment effect. Your current design will not capture this, as the control samples have their own coefficient; any changes in expression in the control samples will have no effect on the spline fit. Instead, you could do something like this:

> design <- model.matrix(~Group:X)
> design <- design[,colSums(abs(design))>0]
> colnames(design)
 [1] "(Intercept)" "GroupT1:X1"  "GroupT2:X1"  "GroupT3:X1"  "GroupT1:X2"
 [6] "GroupT2:X2"  "GroupT3:X2"  "GroupT1:X3"  "GroupT2:X3"  "GroupT3:X3"

In this manner, the set of samples for each treatment still gets fitted to its own spline, but all treatments are using the control samples as the zero time-point. You can then do your comparisons that you've listed above.

As for your second question; the absolute values of the spline coefficients aren't easy to interpret. If you want to enforce some minimum fold change, I would normally suggest using treat, but (at least, for me) there's no intuitive relationship between the value of the coefficients and the fold change at a given time point. Probably the simpler approach is to fit a one-way model with each time point as a separate group, and then use treat with the specified fold-change threshold to compare treatments at each time point.

For your third question; the number of DE genes depends on a number of things, such as the variability of your replicates, the appropriateness of your design and the true amount of DE in the data. You should check MDS plots to see that your replicates are clustered together, and that samples are well-separated by the treatment or time effects. If it looks okay there, then the next question is whether the design is appropriate. One possibility is that the spline is not fitting well, which results in larger variances and loss of power (which is exacerbated when you reduce the spline d.f., as the spline fit will deteriorate). If this is the case, you could try increasing the spline d.f., or try using the one-way model that I described above. Both should result in a better fit at the cost of some residual d.f.

Finally, limma is entirely appropriate for this type of analysis. I'd struggle to think of alternatives if I couldn't do it with limma.

ADD COMMENT • link 10.5 years ago Aaron Lun ★ 29k

0

Entering edit mode

Hi Aaron,

Firstly, sorry for the mistake in design matrix. Here is the corrected one.

FileName   Subject   Group   Time
Ctrl_0h_Rep1   R1   Ctrl   0
Ctrl_0h_Rep2   R2   Ctrl   0
Ctrl_0h_Rep3   R3   Ctrl   0
T1_12h_Rep1   R1   T1   12
T1_12h_Rep2   R2   T1   12
T1_12h_Rep3   R3   T1   12
T1_24h_Rep1   R1   T1   24
T1_24h_Rep2   R2   T1   24
T1_24h_Rep3   R3   T1   24
T1_48h_Rep1   R1   T1   48
T1_48h_Rep2   R2   T1   48
T1_48h_Rep3   R3   T1   48
T1_72h_Rep2   R2   T1   72
T1_72h_Rep3   R3   T1   72
T2_12h_Rep1   R1   T2   12
T2_12h_Rep2   R2   T2   12
T2_12h_Rep3   R3   T2   12
T2_24h_Rep1   R1   T2   24
T2_24h_Rep2   R2   T2   24
T2_24h_Rep3   R3   T2   24
T2_48h_Rep1   R1   T2   48
T2_48h_Rep2   R2   T2   48
T2_48h_Rep3   R3   T2   48
T2_72h_Rep2   R2   T2   72
T2_72h_Rep3   R3   T2   72
T3_12h_Rep1   R1   T3   12
T3_12h_Rep2   R2   T3   12
T3_12h_Rep3   R3   T3   12
T3_24h_Rep1   R1   T3   24
T3_24h_Rep2   R2   T3   24
T3_24h_Rep3   R3   T3   24
T3_48h_Rep1   R1   T3   48
T3_48h_Rep2   R2   T3   48
T3_48h_Rep3   R3   T3   48
T3_72h_Rep1   R1   T3   72
T3_72h_Rep2   R2   T3   72
T3_72h_Rep3   R3   T3   72

Thank you for the suggestions, I have incorporated the changes in the design matrix. So, we have removed group factor form the model and kept only Interactions.

I have few more queries-

After increasing the spline d.f. to 4 (from 3), I got 20 DE genes (16 with 3 d.f.) for T2 vs T1. There are considerable number of genes for other comparisons and the genes are relevant to our research objective. MDS plot also looks fine with samples clustered based on Time point. May be there is less variability in T2 and T1.

As you suggested, I have built one-way model as well, to check T2 vs T1.

TS <- paste(targets$Group, targets$Time, sep=".")
TS <- factor(TS, levels=unique(TS))
design <- model.matrix(~0+TS)
colnames(design) <- levels(TS)

fit <- lmFit(dt, design)

cont.dif <- makeContrasts(
Dif12hr =T2.12-T1.12,
Dif24hr =(T2.24-T2.12)-(T1.24-T1.12),
Dif48hr=(T2.48-T2.24)-(T1.48-T1.24),
Dif72hr=(T2.72-T2.48)-(T1.72-T1.48),
levels=design)
fit2 <- contrasts.fit(fit, cont.dif)
fit2 <- eBayes(fit2)
topTableF(fit2)

This way also I got only 8 genes with FDR 0.1. Do I need to add any extra contrasts here? Am I covering all the required contrasts to check Treatment differences w.r.t time course profiles?

Apart from 0hr observations, I have only 4 time points. Should I use splines for such smaller number of time points?

I understand that it is difficult to interpret spline coefficients. Can I use them for any other purpose, like, plotting?

Thanks,

Sandhya

ADD REPLY • link 10.5 years ago Sandhya Pemmasani Kiran ▴ 40

0

Entering edit mode

Your one-way model looks fine, if you're testing for differences in the time effect between treatments. But don't forget to use duplicateCorrelation for the Subject blocking factor.
If the spline d.f. is equal to the number of time points (minus 1, to be specific, as the intercept is implicit), you might as well fit the one-way model. The final design matrix with splines will have the same residual d.f. as that of a one-way layout, but the latter is easier to interpret.
I've never used spline coefficients for plotting, but I guess you could do so. You could use predict to get the spline basis for a vector of covariate values (i.e., time points), and then matrix-multiply the spline basis by the estimated spline coefficients to get the fitted values of the spline.

ADD REPLY • link 10.5 years ago Aaron Lun ★ 29k