Question

DESeq2/edgeR/limma help with most appropriate design matrix

0

Entering edit mode

rebecca.lea.johnston • 0

@rebeccaleajohnston-22750

Last seen 3.1 years ago

Australia

Hi all,

I am trying to determine the most appropriate design matrix to use for an RNA-seq experiment detailed below. Since my statistics knowledge isn't strong, I'd really appreciate some advice.

Experimental design

We have four patients (specifically patient-derived cell-lines, A - D) and each patient was subjected to four different conditions: a combination of Oxygen (normoxia or hypoxia) plus Treatment (no drug or drug). I combined these conditions into a single factor (Group) with four levels to help with construction of the design matrix.

> library(tidyverse)
> targets <-
  data.frame("Patient" = c(rep("A", 4), rep("B", 4), rep("C", 4), rep("D", 4)),
             "Oxygen" = c(rep("Normoxia", 2), rep("Hypoxia", 2)),
             "Treatment" = c("NoDrug", "Drug")) %>% 
  unite("Group", Oxygen:Treatment, sep = ".", remove = FALSE) %>%
  mutate(Oxygen = factor(Oxygen, levels = c("Normoxia", "Hypoxia")),
         Treatment = factor(Treatment, levels = c("NoDrug", "Drug")),
         Group = 
           factor(Group, 
                  levels = c("Normoxia.NoDrug", "Normoxia.Drug",
                             "Hypoxia.NoDrug", "Hypoxia.Drug"))) %>% 
  select(Patient, Oxygen, Treatment, Group)

> targets

Patient Oxygen      Treatment   Group
A       Normoxia    NoDrug      Normoxia.NoDrug 
A       Normoxia    Drug        Normoxia.Drug   
A       Hypoxia     NoDrug      Hypoxia.NoDrug  
A       Hypoxia     Drug        Hypoxia.Drug    
B       Normoxia    NoDrug      Normoxia.NoDrug 
B       Normoxia    Drug        Normoxia.Drug   
B       Hypoxia     NoDrug      Hypoxia.NoDrug  
B       Hypoxia     Drug        Hypoxia.Drug    
C       Normoxia    NoDrug      Normoxia.NoDrug 
C       Normoxia    Drug        Normoxia.Drug
C       Hypoxia     NoDrug      Hypoxia.NoDrug  
C       Hypoxia     Drug        Hypoxia.Drug    
D       Normoxia    NoDrug      Normoxia.NoDrug 
D       Normoxia    Drug        Normoxia.Drug   
D       Hypoxia     NoDrug      Hypoxia.NoDrug  
D       Hypoxia     Drug        Hypoxia.Drug

Key biological questions

We want to find differentially expressed genes between the following:

Hypoxia.Drug vs Normoxia.NoDrug
Hypoxia.Drug vs Normoxia.Drug
Hypoxia.Drug vs Hypoxia.NoDrug

Additionally (but not as important):

Normoxia.Drug vs Normoxia.NoDrug
Hypoxia.NoDrug vs Normoxia.NoDrug

Possible design matrix?

I have tried my best to find a similar experimental design, I think it is most similar to Section 3.4.2 Blocking of the edgeR user's guide, but in this case we have 4 x sets of four samples (rather than paired samples), is that correct?

Here is the PCA plot for reference, as you can see the samples largely separate by Group, apart from Patient A. Therefore, I am thinking an additive model will be most appropriate, that includes a Patient term and Group term (combination of Oxygen and Treatment) as per below?

Group <- targets$Group
Patient <- targets$Patient
# DESeq2 uses "formula" function instead of "model.matrix" below
design <- model.matrix(~Patient + Group) 
colnames(design) <- gsub(x = colnames(design), pattern = "Group",
                         replacement = "")
colnames(design)
[1] "(Intercept)"    "PatientB"       "PatientC"       "PatientD"       "Normoxia.Drug"  "Hypoxia.NoDrug" "Hypoxia.Drug"

With an additive model, I believe that the assumption is that the Oxygen plus Treatment combination has the same effect on all patients, which is what we see here? And the coefficients from this model can be easily used to form the contrasts of interest above.

Please let me know if this design matrix is appropriate. Many thanks in advance for your help, I always underestimate how difficult it is to set up the design matrix!

Kind regards,

Rebecca

DESeq2 limma modelmatrix edgeR designmatrix • 1.7k views

ADD COMMENT • link updated 3.1 years ago by Gordon Smyth 52k • written 3.2 years ago by rebecca.lea.johnston • 0

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 days ago

United States

hi,

I don't have a lot of extra time to answer statistical design questions on the support site, I have to limit myself to software related questions.

I recommend working with a statistical collaborator or someone familiar with linear models in R.

ADD COMMENT • link 3.1 years ago Michael Love 43k

score 1 · Accepted Answer · 2021-10-27

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Your design matrix is correct. You are right that this is a blocked experiment, similar to a paired experiment but with 4 treatments instead of 2. The model allow you to compare treatment groups within patients, which is the standard way to analyse this sort of experiment. The model allows you to find treatment differences that are consistent between patients, which is what you want.

You could also use this design matrix:

design <- model.matrix(~ 0 + Group + Patient)

This matrix is statistically equivalent to what you have, but allows you to make contrasts between the treatment groups in the same way you would do for oneway layout. Note the order -- this only works in Group follows immediately after the 0+ term.

ADD COMMENT • link 3.1 years ago Gordon Smyth 52k

0

Entering edit mode

Thank you so much for your response Gordon, I truly appreciate it! Also glad to hear my design matrix is correct in this context :)

ADD REPLY • link 3.1 years ago rebecca.lea.johnston • 0

1

Entering edit mode

I have added a suggestion about an alternative representation without the intercept that might help you.

ADD REPLY • link 3.1 years ago Gordon Smyth 52k