design matrix for 4 groups in edgeR
4
2
Entering edit mode
alakatos ▴ 130
@alakatos-6983
Last seen 4.6 years ago
United States

Hello,

My data consist of RNA counts  from 4  phenotypes  with biological replicates. The phenotypes are the combination of 2  genotypes R and A.  The four groups are  wild type, R, A and RA. 

I am looking for DE genes between groups. I am wondering what model is the best for these data: one factor with 4 levels + contrast or glm  (2 variables with interaction). 

Thank you for your help in advance.

Anita 

 

edger design • 2.8k views
ADD COMMENT
1
Entering edit mode

What's the interaction in this case? It seems to me you just have four groups, and want to compare, which means there isn't really an interaction (except possibly in a genetic context). Statistically, for there to be an interaction, you would need say two genotypes and two treatments, not four different genotypes.

ADD REPLY
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 10 hours ago
The city by the bay

I agree with James. It appears you have a one-way layout with four groups. The simplest approach, then, would be to just parameterize the design matrix using these four groups. For example, if you have two replicates in each group, you can do:

grouping <- c("WT", "WT", "R", "R", "A", "A", "RA", "RA")
design <- model.matrix(~0+grouping)

This approach will be the easiest to interpret, as you can formulate the contrasts to compare between your groups.

If you want an genetic interaction model (e.g., to consider epistasis), then you might do something like this. I'll reuse the example above, and assume that R and A are mutations in separate genes. We can then do:

is.a <- c(0,0,0,0,1,1,1,1)
is.r <- c(0,0,1,1,0,0,1,1)
design <- model.matrix(~factor(is.a)*factor(is.r))

If you drop the fourth coefficient, you're testing the interaction between genotypes R and A, i.e., your null hypothesis is that the combined effect of genotype RA is equal to the sum of the individual effects for genotypes R and A. Rejection may indicate epistasis (where the combined effect is less than the predicted additive effect) or synergy (if the combined effect is greater).

---

Edit: Mind you, the results from the second approach can be quite difficult to interpret. You'll need to look at the signs of the second and third coefficients to figure out what's going on. For example, if both the A and R mutations result in a 2-fold increase over WT, then a positive value for the RA interaction term might represent synergy. On the other hand, if both single mutations result in a 2-fold decrease against WT, then a positive value for the RA interaction term might represent epistasis (as additional decreases are not observed). One could also imagine a bunch of strange scenarios; for example, if A results in a 2-fold increase and R results in a 2-fold decrease, then a positive RA interaction term might represent some dominance effect of A over R.

ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

Dear Anita,

Aaron has already given you a complete answer, but I think that you might not have got the main point yet, which is that it doesn't matter which design matrix you use, whether you define a factorial model or a one-way layout. They both give exactly the same result.

However I strongly, strongly, recommend that you use the one-way layout, because it is much easier to work with. The factorial model is only for statistical experts and even then only for particular purposes.

ADD COMMENT
0
Entering edit mode
alakatos ▴ 130
@alakatos-6983
Last seen 4.6 years ago
United States

Thank you for your responses. I should have made it clear from the beginning . They are animal models of mutation(s).  Based on your responses I guess I perform one-way layout with four groups  since we are looking for DE genes between groups. 

ADD COMMENT
0
Entering edit mode
alakatos ▴ 130
@alakatos-6983
Last seen 4.6 years ago
United States

Dear Dr.Smyth, 

Thank you very much for your answer.  It is completely clear now. 

Thanks once again.

Anita 

ADD COMMENT
0
Entering edit mode

For future reference, this kind of post should be added as a comment to Gordon's answer, rather than as a separate answer in itself. The ordering of the answers in the thread changes according to the number of votes for each answer, so your message is not guaranteed to immediately follow Gordon's answer. The comment system is a better alternative when you want to respond to a particular answer.

ADD REPLY

Login before adding your answer.

Traffic: 501 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6