Question: DESEq2 Paired samples Before and after treatment
0
gravatar for adesrichard
2.9 years ago by
adesrichard0 wrote:

Hi everyone,

I would like to use DESeq2 to process RNASeq paired samples (before and on treatment). I read very carefully the different posts and tutorial but I am still uncertain on how to design my experiment formula. Here are my experimental factors :

df=data.frame(Patient.ID = factor(rep(1:8,each=2)),
                Treatment = factor(rep(c("Pre","On"),8),levels=c("Pre","On")),
              Response = factor(c(rep(("N"),10),rep("Y",6)),levels=c("N","Y")),
                Status = factor(c(rep("Naive",2),rep("Second",4),rep("Naive",8),rep("Second",2)),levels=c("Naive","Second")))

I would like to test for the genes that are significantly changing during the treatment (delta between Pre and On treatment) and are specific to responders (Response=Y). I would like to also correct for a batch effect (factor Status) as some patients  have previously being treated with another drug. 

What would be the appropriate design to perform such and experiment?

Thank you so much for your reply!

Best,

Alex

 

 

deseq2 paired samples • 5.5k views
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by adesrichard0
The table (to make the post easier to read):

> data
   Patient.ID Treatment Response Status
1           1       Pre        N  Naive
2           1        On        N  Naive
3           2       Pre        N  Naive
4           2        On        N  Naive
5           3       Pre        N  Naive
6           3        On        N  Naive
7           4       Pre        N  Naive
8           4        On        N  Naive
9           5       Pre        N  Naive
10          5        On        N Second
11          6       Pre        Y Second
12          6        On        Y Second
13          7       Pre        Y Second
14          7        On        Y Second
15          8       Pre        Y Second
16          8        On        Y Second

 

ADD REPLYlink written 2.9 years ago by Simon Anders3.5k

Hi Simon,

I modified the code to make up for the  mistake you caught up.

Thanks a lot!

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by adesrichard0
1

Again, to make it easier to follow here is the sample information (using your updated code). You should include this table in your post in future questions to Bioconductor to help people in understanding your experimental design.

> df[order(df$Response),]
   Patient.ID Treatment Response Status
1           1       Pre        N  Naive
2           1        On        N  Naive
3           2       Pre        N Second
4           2        On        N Second
5           3       Pre        N Second
6           3        On        N Second
7           4       Pre        N  Naive
8           4        On        N  Naive
9           5       Pre        N  Naive
10          5        On        N  Naive
11          6       Pre        Y  Naive
12          6        On        Y  Naive
13          7       Pre        Y  Naive
14          7        On        Y  Naive
15          8       Pre        Y Second
16          8        On        Y Second
ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Michael Love23k
Answer: DESEq2 Paired samples Before and after treatment
0
gravatar for Simon Anders
2.9 years ago by
Simon Anders3.5k
Zentrum für Molekularbiologie, Universität Heidelberg
Simon Anders3.5k wrote:

Let's start with the easy one: Effect of treatment, accounting for the sample pairing:

~ Patient.ID + Treatment

Here, you fit an individual base line for each patient, so that patient-to-patient differences in expression before treatment are absorbed and you only look at the differences On-Pre.

Very important: You need to put a "factor(...)" around the Patient.ID when creating the data frame. Otherwise, R will consider the patient IDs as numeric value, thinking that Patient 4 is expected to have 4 times the expression of something as Patient 1. 

Also, you may want to use "relevel" to make "Pre" the first and "On" the second factor. (Default is alphabetic ordering.) Otehrwise, all the signs come out the wrong way round.

 

Now, let's add the "Response" factor. You want to see how Treatment effect depends on Response, i.e., see an interaction between the two:

~ Patient.ID + Treatment * Response

Now, you get coefficients for patient base line and for treatment as before, but now also one for "Treatment:Response". The "Treatment" effect is now the effect of Treatment for non-responders ('non' because 'N' is the first level of Response), and "Treatment:Response" is the additional effect that comes on top of that if the patient is a responder.

R will also try to estimate a coefficient for "Response", i.e., base line differences between responders and non-responders. However, this will fail, because you already have individual base lines for each patient, and this absorbs all differences between patients, including those between reponders and non-responders. Therefore, you will get NAs for these ones.

 

If you want genes that react to treatment only in responders, you could find all genes with significant difference for Treatment:Response and significant non-difference (using the 'altHypothesis="lessAbs"' option of DESeq's result function) for Treatment. Or you simply look for significance in Treatment:Response and so get all genes for which reaction to treatment differs according to Responder status.

Plotting the log2 fold changes to treatment for non-responders and responders against each other will also be helpful.

 

Finally, Status: To block for these ones, you could use

~ Patient.ID + Treatment * ( Status + Response )

However,  Status is heavily confounded with Response. All naive patients are non-responders and all pre-treated patients are responders, except for Patient 5. Hence, for any difference you may find in the treatment response between patients 6-8 versus patients 1-4, there is no way of saying whether they are due to Responder or due to pre-treatment Status. This actually a very serious issue, compromising your entire study.

On the other hand: Patient 5 switches pre-treatment status from Pre to On. Are you sure your table is correct?

ADD COMMENTlink written 2.9 years ago by Simon Anders3.5k

We don't fill in NA for the columns of the design matrix that are necessary to remove to make the model matrix full rank, as in lm(). What I do is to point people to the section of the vignette which discusses "Model matrix not full rank" (in fact the error itself instructs them to read this) where there is a section discussing: "Consider an experiment with grouped individuals, where we seek to test the group-specific effect of a treatment, while controlling for individual effects.". This gives a way to fit the model with a model matrix that is full rank.

ADD REPLYlink written 2.9 years ago by Michael Love23k
Answer: DESEq2 Paired samples Before and after treatment
0
gravatar for adesrichard
2.9 years ago by
adesrichard0 wrote:

Hi Simon,

 

Thank you very much for your quick reply!

I tried to follow you instructions and had no problem with the following design:

~ Patient.ID + Treatment

However, for the two other design, it seems like my model matrix is not full rank and DESEq give me the following error.

df=data.frame(Patient.ID = factor(rep(1:8,each=2)),
                Treatment = factor(rep(c("Pre","On"),8),levels=c("Pre","On")),
              Response = factor(c(rep(("N"),10),rep("Y",6)),levels=c("N","Y")),
                Status = factor(c(rep("Naive",2),rep("Second",4),rep("Naive",8),rep("Second",2)),levels=c("Naive","Second")))

dds=makeExampleDESeqDataSet(n = 1000, m = 16, betaSD = 0, interceptMean = 4,
                        interceptSD = 2, dispMeanRel = function(x) 4/x + 0.1,
                        sizeFactors = rep(1, 16))
colData(dds)=DataFrame(df)
design(dds)=~ Patient.ID + Treatment * (Status + Response )
dds=DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

Do you think there is a a way around this? I tried to build my own matrix using the same design :

mm=model.matrix(~ Patient.ID + Treatment * ( Status + Response ),colData(dds))
mm=mm[,which(colSums(mm)>0)]
dds=DESeq(dds,full = mm)

But I still get the same error. I was wondering whether it is wise to use `ignoreRank = T` when I build my dataset?

Thanks a lot!

Best,

Alex

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by adesrichard0
1

You should upgrade to the latest version of DESeq2 which is version 12. For a few releases now, the "model matrix not full rank" error points people to a specific place in the vignette where you can find the solution.

ADD REPLYlink written 2.9 years ago by Michael Love23k

Hello Michael, 

Thank you for your comment! I am running DESeq2 version 1.12.3. The complete error I am getting is as followed :

using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

  Please read the vignette section 'Model matrix not full rank':

  vignette('DESeq2')

 

Thank you very much for your input!

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by adesrichard0

Hi Mike,

I read the instruction about the interaction term and nested individuals but I am still unclear what design I should use and what parameters to input in the `DESeq()` function. This is mainly because the number of responders and non-responders is different in my design, which is not the case in the DESEq2 vignette.

df=data.frame( Patient.ID = factor(rep(1:9,each=2)),
               Treatment = factor(rep(c("Pre","On"),9),
                                  levels=c("Pre","On")),
               Response = factor(c(rep(("N"),10),rep("Y",8)),
                                 levels=c("N","Y")),
               Status = factor(c(rep("Naive",4),
                                 rep("Second",4),
                                 rep("Naive",8),
                                 rep("Second",2)),
                               levels=c("Naive","Second")))

> df

   Patient.ID Treatment Response Status Patient.n
1           1       Pre        N  Naive         1
2           1        On        N  Naive         1
3           2       Pre        N  Naive         2
4           2        On        N  Naive         2
5           3       Pre        N Second         3
6           3        On        N Second         3
7           4       Pre        N Second         4
8           4        On        N Second         4
9           5       Pre        N  Naive         5
10          5        On        N  Naive         5
11          6       Pre        Y  Naive         1
12          6        On        Y  Naive         1
13          7       Pre        Y  Naive         2
14          7        On        Y  Naive         2
15          8       Pre        Y  Naive         3
16          8        On        Y  Naive         3
17          9       Pre        Y Second         4
18          9        On        Y Second         4


 Is there any special parameters that should be considered before runninf the DESEq() function in order to get the difference between the Response status looking at the Pre/On treamtment delta?

Thank you very much!

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by adesrichard0

Because you have patient.n up to 5 in non-responders, you will have to remove an empty column which will be the patient.n = 5 effect in responders. You can do this by forming the model matrix, removing the column (which will have all zeros) and then supplying the model matrix to the 'full' argument of DESeq(). You can form the model matrix with:

mm <- model.matrix(*insert design here*,*insert data frame here*)

I think the design you want is then ~status + response + response:patient.n + response:treatment.

 

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Michael Love23k

Thank you very much Michael for the guidance!

Does this means that patient 5 will be eliminated from the response analysis? In this case, isn't the choice of which patient is being excluded/included in the analysis be considered? If yes, let's say I have 100 patients instead, keeping the same proportion, permuting the patients that are being excluded in the design and keeping only common DEGs between the permutations might be a valid alternative?

 

Thanks again guys for your precious input!

ADD REPLYlink written 2.9 years ago by adesrichard0

Sorry, no, I'm not suggesting to remove patient 5. What I was trying to say is that, when you form the model matrix, look at the column names. And then you'll find that one of the columns is just a vector of 0's. The column you want to remove will be named something like "responseY.Patient.n5". You'll need to remove this column before providing the model matrix to 'full' in DESeq(). 

Dealing with columns of 0's in the model matrix is also described in the DESeq2 vignette. See Section 3.12.2 "Levels without samples".

ADD REPLYlink written 2.9 years ago by Michael Love23k
Answer: DESEq2 Paired samples Before and after treatment
0
gravatar for adesrichard
2.9 years ago by
adesrichard0 wrote:

Thank you very much for all the help Michael!

After following your instructions, I first tested the design which worked well.

~Response+ Response:Patient.n + Response:Treatment

However, when I add status to the design, I have model that is not full ranked again. I tried to add a nested Status column to the design bit it was unsuccessful :

> df
   Patient.ID Treatment Response Status Patient.n Status.n
1           1       Pre        N  Naive         1        1
2           1        On        N  Naive         1        1
7           4       Pre        N  Naive         4        2
8           4        On        N  Naive         4        2
9           5       Pre        N  Naive         5        3
10          5        On        N  Naive         5        3
11          6       Pre        Y  Naive         1        4
12          6        On        Y  Naive         1        4
13          7       Pre        Y  Naive         2        5
14          7        On        Y  Naive         2        5
3           2       Pre        N Second         2        1
4           2        On        N Second         2        1
5           3       Pre        N Second         3        2
6           3        On        N Second         3        2
15          8       Pre        Y Second         3        3
16          8        On        Y Second         3        3

colData(dds)=DataFrame(df)
design(dds)=~Status +Response+ Response:Status.n + Response:Patient.n + Response:Treatment
mm=model.matrix(~ Status +Response+ Response:Status.n + Response:Patient.n + Response:Treatment,colData(dds))
mm=mm[,colSums(mm)>0]
dds=DESeq(dds,full = mm,betaPrior = F)
Error in checkFullRank(full) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

  Please read the vignette section 'Model matrix not full rank':

  vignette('DESeq2')

Can a second nested factor be thrown in the design in order to take into consideration the fact that before treatment (Treatment.Pre), the baseline of `Status` is different between 2 cohort, like a batch effect?

Thanks again for your precious help!

ADD COMMENTlink written 2.9 years ago by adesrichard0

Sorry, you are right. I wrote too quickly before. I was responding while sitting at a conference (Bioc2016 though...). I should have pointed out that status is already controlled for when you control for patient effects. You are comparing treatment effects within patient so you don't need to, or in another sense, already have controlled for any differences between status naive and status second samples.

You should just use a design of ~response + response:patient.n + response:treatment.

 

ADD REPLYlink written 2.9 years ago by Michael Love23k
Answer: DESEq2 Paired samples Before and after treatment
0
gravatar for adesrichard
2.9 years ago by
adesrichard0 wrote:

That makes sense indeed! 

Thank you very much guys, this solved my problems and answered my questions.

Thank you for your support!

Alex

ADD COMMENTlink written 2.9 years ago by adesrichard0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 417 users visited in the last hour