Question

Question regarding model matrix in limma.

0

Entering edit mode

Ben ▴ 50

@ben-17772

Last seen 8 months ago

United States

I am following the advice: "If you can't find an answer to your question within one hour - ask someone!"

So here I write, my question regarding model matrices in limma or DESeq2.

I will copy a reprex first:

# Create data frame with information.
groups <- data.frame(group = c(rep(1,6), rep(2,6), rep(3,6), rep(4,6)),
                     subject = c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8),
                     treatment = c(rep("a",6),rep("b",6),rep("a",6),rep("b",6)),
                     time = c(rep(c(1,2,3), 4), rep(c(1,4,5), 4)))

# Setting levels for fixed effects.
groups$treatment <- factor(groups$treatment)
groups$time <- factor(groups$time)
groups$group <- factor(groups$group)

# Creating design formula.
design <- model.matrix( ~ 0 + treatment + treatment:time, data = groups)
design
#>    treatmenta treatmentb treatmenta:time2 treatmentb:time2
#> 1           1          0                0                0
#> 2           1          0                1                0
#> 3           1          0                0                0
#> 4           1          0                0                0
#> 5           1          0                1                0
#> 6           1          0                0                0
#> 7           0          1                0                0
#> 8           0          1                0                1
#> 9           0          1                0                0
#> 10          0          1                0                0
#> 11          0          1                0                1
#> 12          0          1                0                0
#> 13          1          0                0                0
#> 14          1          0                0                0
#> 15          1          0                0                0
#> 16          1          0                0                0
#> 17          1          0                0                0
#> 18          1          0                0                0
#> 19          0          1                0                0
#> 20          0          1                0                0
#> 21          0          1                0                0
#> 22          0          1                0                0
#> 23          0          1                0                0
#> 24          0          1                0                0
#>    treatmenta:time3 treatmentb:time3 treatmenta:time4 treatmentb:time4
#> 1                 0                0                0                0
#> 2                 0                0                0                0
#> 3                 1                0                0                0
#> 4                 0                0                0                0
#> 5                 0                0                0                0
#> 6                 1                0                0                0
#> 7                 0                0                0                0
#> 8                 0                0                0                0
#> 9                 0                1                0                0
#> 10                0                0                0                0
#> 11                0                0                0                0
#> 12                0                1                0                0
#> 13                0                0                0                0
#> 14                0                0                1                0
#> 15                0                0                0                0
#> 16                0                0                0                0
#> 17                0                0                1                0
#> 18                0                0                0                0
#> 19                0                0                0                0
#> 20                0                0                0                1
#> 21                0                0                0                0
#> 22                0                0                0                0
#> 23                0                0                0                1
#> 24                0                0                0                0
#>    treatmenta:time5 treatmentb:time5
#> 1                 0                0
#> 2                 0                0
#> 3                 0                0
#> 4                 0                0
#> 5                 0                0
#> 6                 0                0
#> 7                 0                0
#> 8                 0                0
#> 9                 0                0
#> 10                0                0
#> 11                0                0
#> 12                0                0
#> 13                0                0
#> 14                0                0
#> 15                1                0
#> 16                0                0
#> 17                0                0
#> 18                1                0
#> 19                0                0
#> 20                0                0
#> 21                0                1
#> 22                0                0
#> 23                0                0
#> 24                0                1
#> attr(,"assign")
#>  [1] 1 1 2 2 2 2 2 2 2 2
#> attr(,"contrasts")
#> attr(,"contrasts")$treatment
#> [1] "contr.treatment"
#> 
#> attr(,"contrasts")$time
#> [1] "contr.treatment"

^{Created on 2019-07-03 by the reprex package (v0.3.0)}

As one can see, the dataframe created contains four columns: Group, Subject, Treatment, and Time. We have repeated measures for the subjects at different time points, and we have four groups, each group receiving a treatment at a specific time point. There is no control group in a "classical" sense. We are basically interested in differences between the groups, and this can mean differences occurring as a consequence of the treatment or differences with the same treatment but occurring as a consequence of different intervention time points.

I am bit thrown off right now because of the missing control I guess.

Modeling this experimental setup using an intercept (~ treatment + treatment:time) shows, when I create a model matrix, an intercept representing the mean of all samples (each row in column "intercept" in the model matrix contains "1"). Each coefficient will be based on this mean (difference to this mean). This is of course not what I want. Different groups will represent the baseline within a specific comparison/contrast.

So I created the design formula with ~ 0 + treament + treatment:time. The design matrix is shown in my reprex code. If I want to look for the comparison of group 4 between time point 1 compared to time point 5, how does my contrast look like? If I use the contrast "group4:time5" in the reprex, what is the comparison here?

Thanks for everybody's help. I appreciate it. Especially since this might not be a new question and maybe someone already wrote a similar question which was answered in the past.

PS. I am not worried about the subject information at this point. My plan is to use limma and provide subjects as a blocking factor, or use the dream function with subject as a random effect.

limma deseq2 • 1.2k views

ADD COMMENT • link updated 5.6 years ago by James W. MacDonald 68k • written 5.6 years ago by Ben ▴ 50

score 0 · Answer 1 · 2019-07-03

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 1 day ago

United States

You are not interpreting your design matrix correctly. The intercept isn't the mean of all the samples; it's the mean of the treatment A samples at time 1, because this is a factor effects model (in R, it's a treatment contrast parameterization, hence the contr.treatment attribute for the design matrix). If you omit the intercept (like you do here), then treatmenta and treatmentb are the mean of the two treatments at time 1, and all the other coefficients are the difference between a given time and time 1 for that treatment.

So treatmentb:time5 is the difference between treatment b at time 5 and treatment b at time 1. Beyond that, I have no idea what a 'reprex' might be, so I don't know what 'the contrast group4:time5 in the reprex' means, given that you have no coefficients with that name.

ADD COMMENT • link 5.6 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks for your swift response!

I was assuming that I do make some mistakes in interpreting the model matrix. Your clarification helps!

Sorry for the group4:time5 mixup... that was my mistake and you are right, that does not exist in the model matrix.

But at the end this is (among other contrasts) what I want: A contrast with only group 4 samples between time point 1 and 5. Group 4 belongs to treatment b, but so is also group 2 (see the table created at the beginning of my example). But since group 2 has a different schedule (and my real data is even more complex), there is a case where I really just want group 4 time point 1 vs. 5. I do assume right now, this is not possible with the current matrix. But I am also not sure how to construct the model matrix differently. Including "group" as a fixed factor too?

ADD REPLY • link 5.6 years ago Ben ▴ 50

0

Entering edit mode

The usual advice in this situation is to forgo using the conventional R model building, and instead re-formulate your data so you can compute the mean of each group. So something like

alldat <- apply(groups[,c(3,1,4)], 1, paste, collapse = "_")

will give you a vector of individual groups that you can use to generate the model matrix (for which you should use ~0 so you compute the mean of each group, and can make comparisons using contrasts), and you can then make any comparison you would like. However, you also need to account for the fact that you have pairing, so you need to reformulate the subjects so you can have a full rank design matrix. There is an example that may be applicable in section 3.5 of the edgeR User's Guide. If this seems mysterious to you, I recommend you find somebody local for whom it is not. It's pretty easy to botch the job if you don't really know what you are doing.

ADD REPLY • link 5.6 years ago James W. MacDonald 68k

0

Entering edit mode

reprex is a convenient way to share code that is self sufficient aka reproducible. https://cran.r-project.org/web/packages/reprex/index.html Best.