Question: Limma time series spline help
2
3.9 years ago by
Hilary20
United States
Hilary20 wrote:

I have been reviewing the R/limma manual (esp. sections 9.6.1 and 9.6.2), and need conceptual help in applying this.

In brief, I have a time serise with 12 Times(unevenly spaced), and for my Treatment I have 6 replicate Control individuals plus 8 replicate Affected individuals (14 people total, with repeated measurements over time). Call these variables of interest Time and Treatment.
I also have a covariate of interest recorded for each person and timepoint, measuring a phenotype as a continuous variable. Call the variable P.

I have been considering using the regression spline as in example 9.6.2 many time points, thinking this is more appropriate.

My questions:
(A) Is it correct to include a covariate with the regression spline? E.g., if I follow the example where X=ns(Time, df=number); design=model.matrix(~Treatment*X*P).
I am not sure if it's valid to make this sort of an ANCOVA-type regression, or how to modify it?
My primary contrasts of interest would be the Treatment:Time interaction (impact of treatment during a series of 4 baseline, 4 experimental, plus 4 recovery times), and the relation of covariate P on gene expression.
It might be nice to explore other interactions, but I am a bit concerned with over-paramterizing the model and being able to interpret it correctly, and with testing too many contrasts. Perhaps I should have the 3-way interaction in the design matrix and only select a few contrasts (Time:Treatment, P) to analyze.
I did see there was a 'global' correction for multiple contrasts.

(B) How do I select the df? The limma example suggests 3-5, but it has 16 Control + 16 Treatment individuals with no replication.
I could select 12 knots for my 12 timepoints (df=14, or 1+1+12knots), and if the times were evenly spaced that would yield 14 datapoints (6 Control + 6 Treatment subjects) in each knot-bounded interval, if my understanding is right.
Yet that would lead to a model with a large number of parameters (for a design=model.matrix(~Treatment*X*PVT) there are 52 columns in the matrix).
I'm not sure if that is valid for a dataset with only 14 subjects?
I could do a heuristic trial and error with different values for df, but I'm not sure how best to then evaluate and compare results of different models, since there is no AIC/BIC type output.

I assume I ultimately will need to add a Subjects factor to the design matrix and/or use the duplicateCorrelation command to account for repeated measurements (the 12 timepoints) on each subject.

Thank you in advance for your help in setting up the model.matrix and spline functions.

Hilary

modified 3.9 years ago by Aaron Lun24k • written 3.9 years ago by Hilary20
Answer: Limma time series spline help
6
3.9 years ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

Spline coefficients are difficult to interpret, even in simple designs. A 3-way interaction model involving splines would be quite messy, and you probably won't have enough samples to accommodate it (i.e., insufficient residual d.f.). Instead, I would probably do something like this:

> # Assuming treatment is "A" or "B"
> X <- ns(Time, df=3)
> design <- model.matrix(~0 + Treatment + Treatment:X + P)
> colnames(design)
[1] "TreatmentA"    "TreatmentB"    "P"             "TreatmentA:X1"
[5] "TreatmentB:X1" "TreatmentA:X2" "TreatmentB:X2" "TreatmentA:X3"
[9] "TreatmentB:X3"

This splits up the spline coefficients into treatment-specific columns, so that you can examine the effect of time for each treatment (if you drop all the Treatment:X interaction terms for one treatment); or so that you can test for treatment-specific response to time (if you set up a multi-column contrast equating all of the matching interaction terms between treatments); or so that you can test for any DE between treatments (if you set up a multi-column contrast equating the interaction terms and Treatment terms between treatments). Similarly, if you want to look at the effect of P, you can just drop the corresponding coefficient. You could consider fitting splines with respect to P as well, though you'd be running out of residual d.f. fairly quickly at this point.

As for the number of d.f. in the spline; if the number of d.f. (and spline coefficients) is greater than the number of samples, then your model is obviously overparameterized. Even more so in this case, if you're looking at interactions between the spline and treatment, as this will double the final number of coefficients in the model. I'd stick to the recommended number of knots; as long as it gives you a flexible fit that handles the time-dependent expression trends for most genes, then it should be sufficient.

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by Aaron Lun24k

Thank you very much. This is very helpful. I have a couple more questions, and especially am hoping you wouldn't mind scanning my contrasts (coef) setup below to see if I correctly interpreted what you said. Mainly I'm uncertain of my setup because I typically am able to make one contrasts matrix for all factors I want to test, and in this case I wasn't able to figure out how to do so.

First for specifying the contrast matrix:

#Step i. To test for main effect of the covariate P: I'm not sure how I would do this with makeContrasts (?), so I used coef

>fit = lmFit(y, design, block=subject, correlation=corfit$consensus); fit = eBayes,fit); topTable(fit, coef="p") #Step ii. To test for treatment-specific response to time, i.e. the interaction effect >fit = lmFit(y, design, block=subject, correlation=corfit$consensus);

cont.mat = makeContrasts(TreatmentA.X1-TreatmentB.X1, TreatmentA.X2-TreatmentB.X2, TreatmentA.X3-TreatmentB.X3, levels=make.names(colnames(design)));

fit2 =contrasts.fit(fit, cont.mat);

fit2=eBayes(fit2);

topTable(fit2)

#Step iii. To test for effect of treatment as main effect, regardless of time: as you said a multi-column contrast equating matching interaction terms and treatment

--Same as Step ii. above but with cont.mat = makeContrasts(TreatmentA-TreatmentB, TreatmentA.X1-TreatmentB.X1, TreatmentA.X2-TreatmentB.X2, TreatmentA.X3-TreatmentB.X3, levels=make.names(colnames(design)))

Second, when you mention staying with the recommended 3-5 knots "as long as it gives you a flexible fit that handles the time-dependent expression trends for most genes." How can I tell if the model/knot choice does so? I just found that limma does have an AIC/BIC testing option with selectModel, but I'm not sure if that's the best method or applicable with splines.

Thank you again!

~Hilary

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by Hilary20

For starters, I can't say if your use of block and correlation in lmFit is correct, because I don't know the precise details of your experiment with regards to the origin of each sample. If subject refers to the patient from which each sample was derived, and you have 14 different individuals, then it doesn't seem to make any sense to block on them.

Anyway, the contrasts you set up are mostly right. For the first contrast, make sure the coef matches the name in the design matrix, e.g., use capital "P" if that's what it's called. For the second contrast, I assume you've renamed all interaction terms to replace ":" with "." as the former doesn't work in makeContrasts. You also don't need to re-run lmFit, you can just re-use the fit object from the first contrast. You're correct in running topTable without specifying coef; this will test all contrasts in the matrix at once, which is necessary when you're contrasting spline coefficients.

For the last two contrasts, I wouldn't put any particular weight in the reported log-fold changes. This refers to changes in the spline coefficients and are pretty difficult to interpret. Instead, I'd just plot out the expression trends with respect to time for significant genes, to get a better picture of how they're behaving.

Finally, I usually evaluate model designs informally, by seeing whether a larger model results in a substantial increase (e.g., more than 50%) in the number of DE genes. If so, I use it; otherwise, I keep the simpler model. In your case, though, you don't have a lot of residual d.f. to spare, so I'd stick with df=3 in the spline.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by Aaron Lun24k

Thank you again; you have been very helpful! Because the 12 timepoints are repeated measurements on each of the 14 subjects (12*14 for 168 arrays), I assume it is necessary to use blocking based on the post in LIMMA on matched time series data. I will be sure to plot the genes and use this to interpret rather than relying on fold change; thank you!

Best,

Hilary

I am working on an experimental design very similar to the one you have posted above Aaron, and I had a few questions regarding your response. Could you explicitly write out how each of your suggestions would look using the model.matrix above? For example, I'm trying to do three things with my dataset which I've listed below:

topTable(fit2, coef=c(4:9)) #Identify changes over time (trend) between the two treatments
topTable(fit2, coef=c(4,6,8)) #Identify changes over time in Treatment A
topTable(fit2, coef=c(5,7,9)) #Identify changes over time in Treatment B


How does the output contrast below compare/differ from the first line of code above in terms of interpreting the output?

cont.mat = makeContrasts(TreatmentA.X1-TreatmentB.X1, TreatmentA.X2-TreatmentB.X2, TreatmentA.X3-TreatmentB.X3, levels=make.names(colnames(design)))

ADD REPLYlink modified 5 months ago • written 5 months ago by treleo0

Make a new question.