edgeR/limma help with most appropriate design matrix
Entering edit mode
Last seen 18 hours ago

Hi all,

I am trying to determine the most appropriate design matrix to use for an RNA-seq experiment detailed below. Since my statistics knowledge isn't strong, I'd really appreciate some advice.

Experimental design

We have four patients (specifically patient-derived cell-lines, A - D) and each patient was subjected to four different conditions: a combination of Oxygen (normoxia or hypoxia) plus Treatment (no drug or drug). I combined these conditions into a single factor (Group) with four levels to help with construction of the design matrix.

> library(tidyverse)
> targets <-
  data.frame("Patient" = c(rep("A", 4), rep("B", 4), rep("C", 4), rep("D", 4)),
             "Oxygen" = c(rep("Normoxia", 2), rep("Hypoxia", 2)),
             "Treatment" = c("NoDrug", "Drug")) %>% 
  unite("Group", Oxygen:Treatment, sep = ".", remove = FALSE) %>%
  mutate(Oxygen = factor(Oxygen, levels = c("Normoxia", "Hypoxia")),
         Treatment = factor(Treatment, levels = c("NoDrug", "Drug")),
         Group = 
                  levels = c("Normoxia.NoDrug", "Normoxia.Drug",
                             "Hypoxia.NoDrug", "Hypoxia.Drug"))) %>% 
  select(Patient, Oxygen, Treatment, Group)

> targets

Patient Oxygen      Treatment   Group
A       Normoxia    NoDrug      Normoxia.NoDrug 
A       Normoxia    Drug        Normoxia.Drug   
A       Hypoxia     NoDrug      Hypoxia.NoDrug  
A       Hypoxia     Drug        Hypoxia.Drug    
B       Normoxia    NoDrug      Normoxia.NoDrug 
B       Normoxia    Drug        Normoxia.Drug   
B       Hypoxia     NoDrug      Hypoxia.NoDrug  
B       Hypoxia     Drug        Hypoxia.Drug    
C       Normoxia    NoDrug      Normoxia.NoDrug 
C       Normoxia    Drug        Normoxia.Drug
C       Hypoxia     NoDrug      Hypoxia.NoDrug  
C       Hypoxia     Drug        Hypoxia.Drug    
D       Normoxia    NoDrug      Normoxia.NoDrug 
D       Normoxia    Drug        Normoxia.Drug   
D       Hypoxia     NoDrug      Hypoxia.NoDrug  
D       Hypoxia     Drug        Hypoxia.Drug

Key biological questions

We want to find differentially expressed genes between the following:

  • Hypoxia.Drug vs Normoxia.NoDrug
  • Hypoxia.Drug vs Normoxia.Drug
  • Hypoxia.Drug vs Hypoxia.NoDrug

Additionally (but not as important):

  • Normoxia.Drug vs Normoxia.NoDrug
  • Hypoxia.NoDrug vs Normoxia.NoDrug

Possible design matrix?

I have tried my best to find a similar experimental design, I think it is most similar to Section 3.4.2 Blocking of the edgeR user's guide, but in this case we have 4 x sets of four samples (rather than paired samples), is that correct?

Here is the PCA plot for reference, as you can see the samples largely separate by Group, apart from Patient A. Therefore, I am thinking an additive model will be most appropriate, that includes a Patient term and Group term (combination of Oxygen and Treatment) as per below?

Group <- targets$Group
Patient <- targets$Patient
design <- model.matrix(~Patient + Group)
colnames(design) <- gsub(x = colnames(design), pattern = "Group",
                         replacement = "")
[1] "(Intercept)"    "PatientB"       "PatientC"       "PatientD"       "Normoxia.Drug"  "Hypoxia.NoDrug" "Hypoxia.Drug"

With an additive model, I believe that the assumption is that the Oxygen plus Treatment combination has the same effect on all patients, which is what we see here? And the coefficients from this model can be easily used to form the contrasts of interest above.

Please let me know if this design matrix is appropriate. Many thanks in advance for your help, I always underestimate how difficult it is to set up the design matrix!

Kind regards,


designmatrix modelmatrix rnaseq limma edgeR • 83 views

Login before adding your answer.

Traffic: 315 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6