Question: Overlapping treatments in the design matrix
0
3.8 years ago by
jhj8910
jhj8910 wrote:

Hi,

I am trying to create another design matrix with 4 groups in mind:

1. DiseaseA+DiseaseB, 2. DiseaseA, 3. DiseaseB, 4. Control.

I could create appropriate design matrix to basically perform several analyses of interest such as finding differentially expressed genes between any two groups, However, what I want to find is the genes that are differentially expressed between Disease A and Disease B, but I would like to incorporate DiseaseA+DiseaseB group into this comparison. I have tried various ways such as incorporating a table such as below into creating a design matrix:

DiseaseA     |     DiseaseB

1         |           1

1         |           0

0         |           1

0         |           0

but it is not doing what I thought it would do. Any suggestions will be very helpful.

Thank you.

edger design matrix • 577 views
modified 3.8 years ago by Aaron Lun25k • written 3.8 years ago by jhj8910
Answer: Overlapping treatments in the design matrix
3
3.8 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

Well, you need to make up your mind as to what you want to do. If you want to compare between any two groups, then use a one-way layout where each disease status (control, disease A only, B only, or A + B) forms its own group. As an example, here's how I would do it if the target information were present in has.A and has.B:

has.A <- rep(c("Y", "N"), each=4)
has.B <- rep(rep(c("Y", "N"), each=2), 2)
groups <- factor(paste0("A", has.A, ".B", has.B))
design <- model.matrix(~0 + groups)
colnames(design) <- levels(groups)


You can then compare between any set of groups, e.g., between disease A and B using makeContrasts:

con <- makeContrasts(AY.BN - AN.BY, levels=design)

If you want to use the A + B information in the comparison between A and B, then you need an additive model:

design <- model.matrix(~ has.A + has.B)
con <- makeContrasts(has.AY - has.BY, levels=design)

However, this assumes that the effects of A and B are additive in the group with both diseases. One can easily imagine that this will not be the case in reality, e.g., due to epistatic effects or other interactions between the disease states. If the assumption is violated, you will probably get a decrease in power because the dispersion estimate gets inflated by the poor fit. This may end up being more damaging than just leaving the A + B samples out of the comparison in the first place.

In short, I would go for the one-way layout. It's more flexible in terms of the comparisons you want to do, and also in terms of avoiding the additivity assumptions in the other model. That said, I hope the design matrix in your original post is just an example, and that you actually have replicates within each group, otherwise you'll end up with other problems.

Yes the above is just an example. And yes, your point about them not merely being additive is valid. I will keep that in mind.

Hi Aaron,

Might be a very dumb question but what happens if I have a design matrix such that:

design <- model.matrix(~0+has.A+has.B)

and make a contrast:

con <- makeContrasts(has.AY - has.BY, levels = design)?

The reason I'm asking is if we have a design matrix like (design <- model.matrix(~ has.A + has.B)), I'm not entirely sure what the Intercept term refers to; does it refer to the baseline - has.AN and has.BN. It's slightly confusing to me.

Thanks.

The intercept represents to the average expression of the control group. If you make the design matrix without an intercept, the first column has.AN represents to the average expression of the control; the second column has.AY represents to the average represents the average expression of the A-only group; and the third column has.BY represents the additive effect of B over the control/over the A-only group. I don't think the contrast you've specified above will make any sense, though, because you're asking for whether the expression in the A-only group is equal to the log-fold change of B.

Hi Aaron,

I have one more question regarding the design matrix above. I'm wondering whether the comparison:

design <- model.matrix(~ has.A + has.B)
con <- makeContrasts(has.AY - has.BY, levels=design)

is merely diseaseA VS disease B or is [diseaseA - control] VS [diseaseB - control]. If it is the former, how do I go about doing the latter comparison? Again, thank you for your time in answering the questions.

It's the latter. Each has.XY coefficient represents the effect of disease X over control.