DESeq2/edgeR/limma help with most appropriate design matrix
Entering edit mode
Last seen 19 months ago

Hi all,

I am trying to determine the most appropriate design matrix to use for an RNA-seq experiment detailed below. Since my statistics knowledge isn't strong, I'd really appreciate some advice.

Experimental design

We have four patients (specifically patient-derived cell-lines, A - D) and each patient was subjected to four different conditions: a combination of Oxygen (normoxia or hypoxia) plus Treatment (no drug or drug). I combined these conditions into a single factor (Group) with four levels to help with construction of the design matrix.

> library(tidyverse)
> targets <-
  data.frame("Patient" = c(rep("A", 4), rep("B", 4), rep("C", 4), rep("D", 4)),
             "Oxygen" = c(rep("Normoxia", 2), rep("Hypoxia", 2)),
             "Treatment" = c("NoDrug", "Drug")) %>% 
  unite("Group", Oxygen:Treatment, sep = ".", remove = FALSE) %>%
  mutate(Oxygen = factor(Oxygen, levels = c("Normoxia", "Hypoxia")),
         Treatment = factor(Treatment, levels = c("NoDrug", "Drug")),
         Group = 
                  levels = c("Normoxia.NoDrug", "Normoxia.Drug",
                             "Hypoxia.NoDrug", "Hypoxia.Drug"))) %>% 
  select(Patient, Oxygen, Treatment, Group)

> targets

Patient Oxygen      Treatment   Group
A       Normoxia    NoDrug      Normoxia.NoDrug 
A       Normoxia    Drug        Normoxia.Drug   
A       Hypoxia     NoDrug      Hypoxia.NoDrug  
A       Hypoxia     Drug        Hypoxia.Drug    
B       Normoxia    NoDrug      Normoxia.NoDrug 
B       Normoxia    Drug        Normoxia.Drug   
B       Hypoxia     NoDrug      Hypoxia.NoDrug  
B       Hypoxia     Drug        Hypoxia.Drug    
C       Normoxia    NoDrug      Normoxia.NoDrug 
C       Normoxia    Drug        Normoxia.Drug
C       Hypoxia     NoDrug      Hypoxia.NoDrug  
C       Hypoxia     Drug        Hypoxia.Drug    
D       Normoxia    NoDrug      Normoxia.NoDrug 
D       Normoxia    Drug        Normoxia.Drug   
D       Hypoxia     NoDrug      Hypoxia.NoDrug  
D       Hypoxia     Drug        Hypoxia.Drug

Key biological questions

We want to find differentially expressed genes between the following:

  • Hypoxia.Drug vs Normoxia.NoDrug
  • Hypoxia.Drug vs Normoxia.Drug
  • Hypoxia.Drug vs Hypoxia.NoDrug

Additionally (but not as important):

  • Normoxia.Drug vs Normoxia.NoDrug
  • Hypoxia.NoDrug vs Normoxia.NoDrug

Possible design matrix?

I have tried my best to find a similar experimental design, I think it is most similar to Section 3.4.2 Blocking of the edgeR user's guide, but in this case we have 4 x sets of four samples (rather than paired samples), is that correct?

Here is the PCA plot for reference, as you can see the samples largely separate by Group, apart from Patient A. Therefore, I am thinking an additive model will be most appropriate, that includes a Patient term and Group term (combination of Oxygen and Treatment) as per below?

Group <- targets$Group
Patient <- targets$Patient
# DESeq2 uses "formula" function instead of "model.matrix" below
design <- model.matrix(~Patient + Group) 
colnames(design) <- gsub(x = colnames(design), pattern = "Group",
                         replacement = "")
[1] "(Intercept)"    "PatientB"       "PatientC"       "PatientD"       "Normoxia.Drug"  "Hypoxia.NoDrug" "Hypoxia.Drug"

With an additive model, I believe that the assumption is that the Oxygen plus Treatment combination has the same effect on all patients, which is what we see here? And the coefficients from this model can be easily used to form the contrasts of interest above.

Please let me know if this design matrix is appropriate. Many thanks in advance for your help, I always underestimate how difficult it is to set up the design matrix!

Kind regards,


DESeq2 limma modelmatrix edgeR designmatrix • 944 views
Entering edit mode
Last seen 2 hours ago
WEHI, Melbourne, Australia

Your design matrix is correct. You are right that this is a blocked experiment, similar to a paired experiment but with 4 treatments instead of 2. The model allow you to compare treatment groups within patients, which is the standard way to analyse this sort of experiment. The model allows you to find treatment differences that are consistent between patients, which is what you want.

You could also use this design matrix:

design <- model.matrix(~ 0 + Group + Patient)

This matrix is statistically equivalent to what you have, but allows you to make contrasts between the treatment groups in the same way you would do for oneway layout. Note the order -- this only works in Group follows immediately after the 0+ term.

Entering edit mode

Thank you so much for your response Gordon, I truly appreciate it! Also glad to hear my design matrix is correct in this context :)

Entering edit mode

I have added a suggestion about an alternative representation without the intercept that might help you.

Entering edit mode
Last seen 1 hour ago
United States


I don't have a lot of extra time to answer statistical design questions on the support site, I have to limit myself to software related questions.

I recommend working with a statistical collaborator or someone familiar with linear models in R.


Login before adding your answer.

Traffic: 839 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6