Hi,
I'm investigating changes in gene expression after a body weight loss intervention. The data includes baseline measurements and two follow-up measurements. The 40 participants have varying body weights at baseline, and they lose varying amounts of body weight during the study. My main model (see m1 below) has the standard structure used in this scenario. However, because the participants start with different body weights and lose varying amounts of body weight, I would additionally wish to model the intervention "effect" using the percentage of lost body weight from baseline to the specific visit as the explanatory variable. To achieve this aim, I ended up with a model (see m2 below), but I'm unsure whether it is applicable to answer my question.
I would greatly appreciate any help and/or suggestions.
Best, Jari
# covariate data
participant <- factor(rep(1:4, each = 3))
visit <- factor(rep(c("bl", "fup1", "fup2"), times = 4), levels = c("bl", "fup1", "fup2"))
wlp <- c(0, -7, -15, 0, -12, -24, 0, -6, -13, 0, -9, -18)
# model 1, standard time effect (using contrast to compare fup1 and fup2 afterwards)
m1 <- model.matrix(~ 0 + participant + visit)
colnames(m1)
# model 2, "intervention effect" by body weight loss percentage
m2 <- model.matrix(~ 0 + participant + visit:wlp)
m2 <- m2[,-5] # remove "visitbl:wlp" column (redundant)
colnames(m2) <- gsub(":", "_", colnames(m2))
colnames(m2)
Thank you so much, Gordon!
The weight loss actually differs more in the real data and some participants even regain body weight between follow-ups 2 and 3. I'm sorry that my data example did not show this variability.
The idea with
m2
was to try to find the association between the change in relative body weight (wlp
) and gene expression. The inclusion of only the interaction term inm2
results in same number of terms as inm1
. The main difference between the two models is that inm1
the predictors are dichotomous but inm2
they are continuous (see figures model matrixes below). Therefore, I was wondering whether this design could capture the "effect" related towlp
change. The two models produce very similar set of DE genes. Inm1
I would interpret that the fold changes show the DE between baseline and the follow-up point, and in model 2, the fold change between baseline and the follow-up point per one percentage of weight loss. What do you think?The model model you suggested
m2 <- model.matrix(~ participant + wlp)
indeed gives very similar DE to them1
. Thank you also for the suggestion onm3
, the two explanatory variables mostly cancel each other out.I've already told you how I would analyse the data.
Your model
m2
represents a quite artificial model and it doesn't correspond very closely to what you said you wanted to do. It assumes no expression changes at either followup time in the absence of weight change, which seems a very strong assumption. It also allows completely different responses to weight changes at the two followups, which seems unintuitive. I don't see the logic of a visit by wlp interaction. But honestly, if you know what modelm2
model represents and you want to use it, then go ahead, it's your analysis. On the other hand, if you're not confident of what the model means, then I wonder why you are proposing it.Anyway, you're asking for scientific analysis advice here rather than for advice in how to use the software. I can only give limited scientific advice because I don't know the data or the scientific context.
Thank you! You are correct. I was just wondering whether the
m2
could offer an alternative way to analyze the data. I'll continue as you proposed.