Question

Using a change in a continuous explanatory variable to test changes in gene expression with limma

0

Entering edit mode

jari.karppinen • 0

@56445275

Last seen 19 months ago

Finland

Hi,

I'm investigating changes in gene expression after a body weight loss intervention. The data includes baseline measurements and two follow-up measurements. The 40 participants have varying body weights at baseline, and they lose varying amounts of body weight during the study. My main model (see m1 below) has the standard structure used in this scenario. However, because the participants start with different body weights and lose varying amounts of body weight, I would additionally wish to model the intervention "effect" using the percentage of lost body weight from baseline to the specific visit as the explanatory variable. To achieve this aim, I ended up with a model (see m2 below), but I'm unsure whether it is applicable to answer my question.

I would greatly appreciate any help and/or suggestions.

Best, Jari

# covariate data
participant <- factor(rep(1:4, each = 3))
visit <- factor(rep(c("bl", "fup1", "fup2"), times = 4), levels = c("bl", "fup1", "fup2"))
wlp <- c(0, -7, -15, 0, -12, -24, 0, -6, -13, 0, -9, -18)

# model 1, standard time effect (using contrast to compare fup1 and fup2 afterwards)
m1 <- model.matrix(~ 0 + participant + visit)
colnames(m1)

# model 2, "intervention effect" by body weight loss percentage
m2 <- model.matrix(~ 0 + participant + visit:wlp)
m2 <- m2[,-5] # remove "visitbl:wlp" column (redundant)
colnames(m2) <- gsub(":", "_", colnames(m2))
colnames(m2)

limma • 1.2k views

ADD COMMENT • link updated 22 months ago by Gordon Smyth 53k • written 22 months ago by jari.karppinen • 0

score 1 · Answer 1 · 2024-04-03

1

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 9 hours ago

WEHI, Melbourne, Australia

Adding wlp into the model will be of limited value because, from the data you show, the weight losses are very consistent between participants and wlp is almost entirely confounded with visit.

You don't explain the reasoning by which you ended up with model m2, but it has too many terms in it and it does not test the hypothesis you want. [Edit later: I see from your comment below that model matrix m2 has the same number of columns as m1 once a column is manually removed, but it still doesn't seem a logical model to me. I don't see any motivation for an interaction model in this scientific context.]

You can try the two alternative models:

m1 <- model.matrix(~ participant + visit)
m2 <- model.matrix(~ participant + wlp)

If the second model gives just as much DE as the first, then that is evidence that expression changes between visits are largely explained by the weight loss.

If the participants vary a lot in terms of weight loss (more than in the toy data that you show), then you could also try

m3 <- model.matrix(~ participant + visit + wlp)

in order to check whether there are visit effects that are not explained by weight loss. Beware though that you need to be very careful in interpeting this model because wlp is almost entirely confounded with visit. This model can only be used to test the hypothesis that I mentioned. It is not suitable for general assessment of differential expression because the two predictors visit and wlp will interfere with each other and mask each other's effects.

ADD COMMENT • link 22 months ago Gordon Smyth 53k

0

Entering edit mode

Thank you so much, Gordon!

The weight loss actually differs more in the real data and some participants even regain body weight between follow-ups 2 and 3. I'm sorry that my data example did not show this variability.

The idea with m2 was to try to find the association between the change in relative body weight (wlp) and gene expression. The inclusion of only the interaction term in m2 results in same number of terms as in m1. The main difference between the two models is that in m1 the predictors are dichotomous but in m2 they are continuous (see figures model matrixes below). Therefore, I was wondering whether this design could capture the "effect" related to wlp change. The two models produce very similar set of DE genes. In m1 I would interpret that the fold changes show the DE between baseline and the follow-up point, and in model 2, the fold change between baseline and the follow-up point per one percentage of weight loss. What do you think?

# covariate data
participant <- factor(rep(1:4, each = 3))
visit <- factor(rep(c("bl", "fup1", "fup2"), times = 4), levels = c("bl", "fup1", "fup2"))
wlp <- c(0, -7, -15, 0, -12, -24, 0, -6, -13, 0, -9, -18)

# model 1, standard time effect
m1 <- model.matrix(~ 0 + participant + visit)
colnames(m1)
head(m1)

head(m1)

# model 2, time effect by body weight loss percentage
m2 <- model.matrix(~ 0 + participant + visit:wlp)
m2 <- m2[,-5] # remove "visitbl:wlp" column
colnames(m2) <- gsub(":", "_", colnames(m2))
colnames(m2)
head(m2)

head(m2)

The model model you suggested m2 <- model.matrix(~ participant + wlp) indeed gives very similar DE to the m1. Thank you also for the suggestion on m3, the two explanatory variables mostly cancel each other out.

ADD REPLY • link 22 months ago jari.karppinen • 0

0

Entering edit mode

I've already told you how I would analyse the data.

Your model m2 represents a quite artificial model and it doesn't correspond very closely to what you said you wanted to do. It assumes no expression changes at either followup time in the absence of weight change, which seems a very strong assumption. It also allows completely different responses to weight changes at the two followups, which seems unintuitive. I don't see the logic of a visit by wlp interaction. But honestly, if you know what model m2 model represents and you want to use it, then go ahead, it's your analysis. On the other hand, if you're not confident of what the model means, then I wonder why you are proposing it.

Anyway, you're asking for scientific analysis advice here rather than for advice in how to use the software. I can only give limited scientific advice because I don't know the data or the scientific context.

ADD REPLY • link 22 months ago Gordon Smyth 53k

0

Entering edit mode

Thank you! You are correct. I was just wondering whether the m2 could offer an alternative way to analyze the data. I'll continue as you proposed.

ADD REPLY • link 22 months ago jari.karppinen • 0