Hi everyone -
I was hoping to get your help understanding when it is appropriate to use a 0-intercept model. I have log2-transformed, quantile-normalized fluorescence values off of a series of microarrays. I would like to determine the effects of treatment X vs treatment Y on each gene measured in a patient at compared to baseline. I have 50 patients undergoing treatment X, and a different set of 50 patients undergoing treatment Y (n=50 in each arm, n=100 in the experiment). I have measurements for each patient at baseline, and two subsequent timepoints (I will call these timepoints 1 [baseline], 2, and 3), with timepoint 3 being the end of the study.
My mapping data looks like the following:
PID Treatment Timepoint Treat_Time P01C5553301 P01C555 Y 1 Y_1 P01C5553303 P01C555 Y 3 Y_3 P01C6663301 P01C666 X 1 X_1 P01C6663303 P01C666 X 3 X_3 P01C7773301 P01C777 X 1 X_1 P01C7773303 P01C777 X 3 X_3
To set up my linear model design, I read through several answers on Bioconductor and came up with the following:
design1 <- model.matrix(~ 0 + Treat_Time + PID, data = map.samples1) colnames(design1)[1:4] <- levels(Treat_Time) fit1 <- lmFit(eset1, design1) cont.dif <- makeContrasts( DeltaDelta = (X_2 - X_1) - (Y_2 - Y_1), levels = design1)
My goal was to eventually develop a contrast matrix that would look at the difference between the two treatment groups, accounting for the random effects of patient. I have two questions:
First question: Does using a 0-intercept model make sense here? There are no probes that have 0 as the mean value, even in the absence of any covariates or main effects. Thus, to me it doesn't make sense to use a 0-intercept. However, so many of the answers that I have seen on Bioconductor regarding time-series or interaction terms include the 0 intercept, so I was starting to have doubts. In addition, if I don't use a 0-intercept, X_1 does not make it into the design matrix, i.e.
design1 <- model.matrix(~ 0 + Treat_Time + PID, data = map.samples1) head(colnames(design1)) "Treat_TimeX_1" "Treat_TimeX_2" "Treat_TimeY_1" "Treat_TimeY_2" "PIDP01C666" "PIDP01C777" design2 <- model.matrix(~ + Treat_Time + PID) head(colnames(design2)) "(Intercept)" "Treat_TimeX_2" "Treat_TimeY_1" "Treat_TimeY_2" "PIDP01C666" "PIDP01C777"
Therefore, I wouldn't be able to make the interaction contrast if I didn't have the 0 intercept. The more general question is: when is it appropriate to utilize a 0-intercept model? Does one need to center-scale/standardize the expression data first?
Second question: Am I modeling the interaction between treatment and time appropriately, while accounting for random effects? I have seen two methods of modeling the random effects of PID. The first method adds PID to the model as a covariate ( https://support.bioconductor.org/p/125149 ) while the second method uses duplicateCorrelation() ( https://support.bioconductor.org/p/70070 ). When is each method appropriate?
Thank you so much for all of your help. If there are any references that you feel I should read based on where you think my understanding is at with linear models and utilizing limma in general, please let me know!