Hi,
I have a diet study experiment, where I would like to test whether a continuous variable, cont.var,
is related with gene expression data.
1) I would like to find the effect of cont.var
within each category (GroupA/GroupB), and within each time-point (T0/T1). The gene expression data was collected at time T0 and then at time T1. I would like to test the effect of the continuous variable in each of the 4 groups (GroupA_T0, GroupA_T1, GroupB_T0, GroupB_T1).
2) Additionally, I have uneven number of observations for each subject. Each subject is either in GroupA or GroupB. So we have a confounding group and subject effect. Multiple but not equal number of samples are collected for each subjects at two time points, the number of observations per subject at a specific time point varies between 0 and 3. There are 70 subjects and 326 samples in total.
3) The might be a batch effect. And the information about sample batch (B1-B5) membership is known.
As I am interested in the effect of cont.var
within each group and at each time point I started modeling the data as follows (rawcount contains the raw expression data):
design <- model.matrix(~ 0 + Lane + Group + Timepoint + Timepoint:Group:cont.var, metadata) v <- voom(rawcount, design, lib.size = libSize, plot = FALSE) # Account for subject (individual) effect: corfit <- duplicateCorrelation(v, design, block=metadata$SubjectID) v <- voom(rawcount, design, lib.size = libSize, block = metadata$SubjectID, correlation = corfit$consensus, plot = plot) fit <- lmFit(v, design, block = metadata$SubjectID, correlation = corfit$consensus) fit <- lmFit(v, design) fit <- contrasts.fit(fit, contrast) fit <- eBayes(fit)
My question is whether my deisgn matrix was declared appropriately, and what should be the contrast parameter for contrasts.fit()?
I am interested in both:
a) whether there is an effect of the continuous variable for a particular gene at all within each group and each timepoint.
b) whether the effect of the continuous variable is different between 2 groups at each time point or between time points for each group individually.
I would really appreciate some help!
Thanks,
Lan
It should be a pretty straightforward extension. After running your code:
Look at the column names:
The first 8 coefficients are the same: the lane, group, and timepoint effects. Then, for each combination of group and time point, you now have 5 coefficients, X1 through X5. To test the continuous variable in each group/time, just test the five coefficients for that combination. Pass them as a vector, e.g.
c(9, 13, 17, 21, 25)
for Group A at time 0. This will perform an ANOVA of all 5 coefficients, giving you a single p-value for overall evidence of nonlinear response to the continuous variable. To test for differences in the nonlinear response to cont.var between two groups, you will have to form the 5 contrasts between corresponding spline coefficients for the two groups and perform an ANOVA on these 5 coefficients.As to whether you have enough data to fit splines with 5 df to your data, I honestly have no idea. I work primarily in RNA-seq and other similarly expensive data, so I've never had a data set large enough to make spline fits practical.
Note: Please use the "Add comment" button to reply to someone else's answer, instead of the "add your answer" box.