Question: EdgeR. Analysis with pairing across condition and treatment. Design Matrix
0
4.9 years ago by
chwhite0
United States
chwhite0 wrote:

I have the following experimental design and wanted to check the construction of the design matrix as well as a few other questions about setting it up.

Patient<-c("P1","P1","P1","P2","P2","P2","P3","P3","P3","P4","P4","P4","P1","P1","P1","P2","P2","P2","P3","P3","P3","P4","P4","P4")
Condition<-c("uninf","uninf","uninf","uninf","uninf","uninf","uninf","uninf","uninf","uninf","uninf","uninf","inf","inf","inf","inf","inf","inf","inf","inf","inf","inf","inf","inf")
Treatment<-c("D","S","R","D","S","R","D","S","R","D","S","R","D","S","R","D","S","R","D","S","R","D","S","R")
Exp_Design<-data.frame(cbind(Patient,Condition,Treatment))

From the edgeR user guide, an example that had pairing across treatment but not across Condition (Section 3.5) used the following design matrix.

design<-model.matrix(~Condition+Condition:Patient+Condition:Treatment)

In my case though, I have pairing across treatment and across condition so would the proper design matrix be the following.

design2<-model.matrix(~Condition+Patient+Condition:Treatment)

My second question.  The intercept has created confusion in my lab whenever I have to try to explain it to someone.
I tried to remove the intercept by using the following.

design3<-model.matrix(~0+Condition+Patient+Condition:Treatment)

This creates one with both conditions and I can create a contrast to get the difference between conditions, but I'm missing a patient and a Condition:Treatment.
I don't believe I have un-paramaterized the data properly.  How would I go about setting that up?

Final question.  With this experimental setup, would I be better served by running several paired comparisons
where I separate the data based upon which condition (inf,uninf) I'm looking at?
Example.

keep<-which(Condition%in%"inf")
Pat2<-Patient[keep]
Cond2<-Condition[keep]
Treat2<-Treatment[keep]

Exp_design_inf<-cbind(Pat2,Cond2,Treat2)
design_inf<-model.matrix(~Pat2+Treat2)

And then repeat that procedure with the uninf condition.
If this is the route to take, how would I create the design matrix with pairing but without paramaterizing using the intercept?

edger anova design matrix • 2.1k views
modified 4.9 years ago by Gordon Smyth39k • written 4.9 years ago by chwhite0

The easiest way to define the design matrix depends partly on what comparisons you want to make. What comparisons between Conditions and Treatments do you want to make?

Thanks for the information.

I am interested in comparing the effect of the treatment (D vs. S, D vs. R, S vs. R) in both inf and uninf conditions.  I am also interested in the differences between the inf and uninf conditions under the effect of all 3 treatments.

The reasons it is a paired design across the condition is each patient has their sample collected and then half of that sample in infected with a virus and the other half is left alone.  After that, each sample is treated with the solvent, D, or two drugs of interest.

Answer: EdgeR. Analysis with pairing across condition and treatment. Design Matrix
2
4.9 years ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

Your design is a blocked design (a generalization of a paired design) and is covered by Section 3.4.2 in the edgeR User's Guide.

Thanks for clarifying the comparisons you want to make. Block designs are easy to analyse. You simply add +Patient to the model formula, like this:

design <- model.matrix(~ XXXX + Patient)

where XXXX is whatever you would have done without the blocking variable.

In your case, you have two treatment factors with all combinations of the factor levels appearing in your experiment. So you can follow Section 3.3.1 of the edgeR User's Guide to set up a combined factor for Treatment and Condition, and then just add +Patient. Pretty simple when you get the idea!

Thanks, and answered above.  I had read the blocking effect but was unsure how to insert the Condition information into the design.  If I separated the two conditions, then I would use the following.

design(~Patient+Treatment)

I was also confused on how to set this up as in section 3.3.1 without the intercept as that was conceptually easier to understand and explain to others.

If you want to remove the intercept, you have use ~0+Treatment+Patient and not ~0+Patient+Treatment. In other words, follow the advice I gave you exactly.

There isn't any reason to separate the two conditions. It is better to analyze all the data together.

Will do.  One last thing.  I noticed that when I use ~0+Treatment+Patient, it contains all of the Treatments but it is missing the first patient.  Since I'm not using the subtraction design, do I need all Patients in this.  A work around I played with was to add in a 0 level when generating the Patient factor.

Patient<-factor(Experimental_Design\$Patient,levels=c(0,1,2,3,4))

When using design with this, all Patients show up.  Is this correct approach or should I let model.matrix remove the first patient.

Edit.  When trying this it wouldn't actually estimate the dispersions as the coefficient for patient 4 was not estimable.  So, I guess leave the levels as 1, 2, 3, and 4.

One cannot include a coefficient for all the patients -- it would produce an over-parametrized model. There may be four patients, but there are only three differences between the patients, so there can only be three coefficients in the linear model. This isn't something you need to worry about anyway because you should not be testing contrasts between the patients. Just let the software do the right thing for you.