Overlapping treatments in the design matrix
1
0
Entering edit mode
jhj89 ▴ 10
@jhj89-9623
Last seen 6.9 years ago

Hi,

I am trying to create another design matrix with 4 groups in mind:

1. DiseaseA+DiseaseB, 2. DiseaseA, 3. DiseaseB, 4. Control.

I could create appropriate design matrix to basically perform several analyses of interest such as finding differentially expressed genes between any two groups, However, what I want to find is the genes that are differentially expressed between Disease A and Disease B, but I would like to incorporate DiseaseA+DiseaseB group into this comparison. I have tried various ways such as incorporating a table such as below into creating a design matrix:

DiseaseA     |     DiseaseB

         1         |           1

         1         |           0

         0         |           1

         0         |           0

but it is not doing what I thought it would do. Any suggestions will be very helpful.

Thank you.

 

 

 

edger design matrix • 1.0k views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 2 hours ago
The city by the bay

Well, you need to make up your mind as to what you want to do. If you want to compare between any two groups, then use a one-way layout where each disease status (control, disease A only, B only, or A + B) forms its own group. As an example, here's how I would do it if the target information were present in has.A and has.B:

has.A <- rep(c("Y", "N"), each=4)
has.B <- rep(rep(c("Y", "N"), each=2), 2)
groups <- factor(paste0("A", has.A, ".B", has.B))
design <- model.matrix(~0 + groups)
colnames(design) <- levels(groups)

You can then compare between any set of groups, e.g., between disease A and B using makeContrasts:

con <- makeContrasts(AY.BN - AN.BY, levels=design)

If you want to use the A + B information in the comparison between A and B, then you need an additive model:

design <- model.matrix(~ has.A + has.B)
con <- makeContrasts(has.AY - has.BY, levels=design)

However, this assumes that the effects of A and B are additive in the group with both diseases. One can easily imagine that this will not be the case in reality, e.g., due to epistatic effects or other interactions between the disease states. If the assumption is violated, you will probably get a decrease in power because the dispersion estimate gets inflated by the poor fit. This may end up being more damaging than just leaving the A + B samples out of the comparison in the first place.

In short, I would go for the one-way layout. It's more flexible in terms of the comparisons you want to do, and also in terms of avoiding the additivity assumptions in the other model. That said, I hope the design matrix in your original post is just an example, and that you actually have replicates within each group, otherwise you'll end up with other problems.

ADD COMMENT
0
Entering edit mode

Thank you for your answer Aaron.

Yes the above is just an example. And yes, your point about them not merely being additive is valid. I will keep that in mind.

ADD REPLY
0
Entering edit mode

Hi Aaron,

Might be a very dumb question but what happens if I have a design matrix such that:

design <- model.matrix(~0+has.A+has.B)

and make a contrast:

con <- makeContrasts(has.AY - has.BY, levels = design)?

The reason I'm asking is if we have a design matrix like (design <- model.matrix(~ has.A + has.B)), I'm not entirely sure what the Intercept term refers to; does it refer to the baseline - has.AN and has.BN. It's slightly confusing to me.

Thanks.

ADD REPLY
0
Entering edit mode

The intercept represents to the average expression of the control group. If you make the design matrix without an intercept, the first column has.AN represents to the average expression of the control; the second column has.AY represents to the average represents the average expression of the A-only group; and the third column has.BY represents the additive effect of B over the control/over the A-only group. I don't think the contrast you've specified above will make any sense, though, because you're asking for whether the expression in the A-only group is equal to the log-fold change of B.

ADD REPLY
0
Entering edit mode

Hi Aaron,

I have one more question regarding the design matrix above. I'm wondering whether the comparison:

design <- model.matrix(~ has.A + has.B)
con <- makeContrasts(has.AY - has.BY, levels=design)

is merely diseaseA VS disease B or is [diseaseA - control] VS [diseaseB - control]. If it is the former, how do I go about doing the latter comparison? Again, thank you for your time in answering the questions.

ADD REPLY
0
Entering edit mode

It's the latter. Each has.XY coefficient represents the effect of disease X over control.

ADD REPLY

Login before adding your answer.

Traffic: 1031 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6