I have been reviewing the R/limma manual (esp. sections 9.6.1 and 9.6.2), and need conceptual help in applying this.
In brief, I have a time serise with 12 Times(unevenly spaced), and for my Treatment I have 6 replicate Control individuals plus 8 replicate Affected individuals (14 people total, with repeated measurements over time). Call these variables of interest Time and Treatment.
I also have a covariate of interest recorded for each person and timepoint, measuring a phenotype as a continuous variable. Call the variable P.
I have been considering using the regression spline as in example 9.6.2 many time points, thinking this is more appropriate.
(A) Is it correct to include a covariate with the regression spline? E.g., if I follow the example where X=ns(Time, df=number); design=model.matrix(~Treatment*X*P).
I am not sure if it's valid to make this sort of an ANCOVA-type regression, or how to modify it?
My primary contrasts of interest would be the Treatment:Time interaction (impact of treatment during a series of 4 baseline, 4 experimental, plus 4 recovery times), and the relation of covariate P on gene expression.
It might be nice to explore other interactions, but I am a bit concerned with over-paramterizing the model and being able to interpret it correctly, and with testing too many contrasts. Perhaps I should have the 3-way interaction in the design matrix and only select a few contrasts (Time:Treatment, P) to analyze.
I did see there was a 'global' correction for multiple contrasts.
(B) How do I select the df? The limma example suggests 3-5, but it has 16 Control + 16 Treatment individuals with no replication.
I could select 12 knots for my 12 timepoints (df=14, or 1+1+12knots), and if the times were evenly spaced that would yield 14 datapoints (6 Control + 6 Treatment subjects) in each knot-bounded interval, if my understanding is right.
Yet that would lead to a model with a large number of parameters (for a design=model.matrix(~Treatment*X*PVT) there are 52 columns in the matrix).
I'm not sure if that is valid for a dataset with only 14 subjects?
I could do a heuristic trial and error with different values for df, but I'm not sure how best to then evaluate and compare results of different models, since there is no AIC/BIC type output.
I assume I ultimately will need to add a Subjects factor to the design matrix and/or use the duplicateCorrelation command to account for repeated measurements (the 12 timepoints) on each subject.
Thank you in advance for your help in setting up the model.matrix and spline functions.