DESEq2 Paired samples Before and after treatment
4
3
Entering edit mode
adesrichard ▴ 30
@adesrichard-10969
Last seen 7.7 years ago

Hi everyone,

I would like to use DESeq2 to process RNASeq paired samples (before and on treatment). I read very carefully the different posts and tutorial but I am still uncertain on how to design my experiment formula. Here are my experimental factors :

df=data.frame(Patient.ID = factor(rep(1:8,each=2)),
                Treatment = factor(rep(c("Pre","On"),8),levels=c("Pre","On")),
              Response = factor(c(rep(("N"),10),rep("Y",6)),levels=c("N","Y")),
                Status = factor(c(rep("Naive",2),rep("Second",4),rep("Naive",8),rep("Second",2)),levels=c("Naive","Second")))

I would like to test for the genes that are significantly changing during the treatment (delta between Pre and On treatment) and are specific to responders (Response=Y). I would like to also correct for a batch effect (factor Status) as some patients  have previously being treated with another drug. 

What would be the appropriate design to perform such and experiment?

Thank you so much for your reply!

Best,

Alex

 

 

deseq2 paired samples • 31k views
ADD COMMENT
1
Entering edit mode
The table (to make the post easier to read):

> data
   Patient.ID Treatment Response Status
1           1       Pre        N  Naive
2           1        On        N  Naive
3           2       Pre        N  Naive
4           2        On        N  Naive
5           3       Pre        N  Naive
6           3        On        N  Naive
7           4       Pre        N  Naive
8           4        On        N  Naive
9           5       Pre        N  Naive
10          5        On        N Second
11          6       Pre        Y Second
12          6        On        Y Second
13          7       Pre        Y Second
14          7        On        Y Second
15          8       Pre        Y Second
16          8        On        Y Second

 

ADD REPLY
0
Entering edit mode

Hi Simon,

I modified the code to make up for the  mistake you caught up.

Thanks a lot!

ADD REPLY
2
Entering edit mode

Again, to make it easier to follow here is the sample information (using your updated code). You should include this table in your post in future questions to Bioconductor to help people in understanding your experimental design.

> df[order(df$Response),]
   Patient.ID Treatment Response Status
1           1       Pre        N  Naive
2           1        On        N  Naive
3           2       Pre        N Second
4           2        On        N Second
5           3       Pre        N Second
6           3        On        N Second
7           4       Pre        N  Naive
8           4        On        N  Naive
9           5       Pre        N  Naive
10          5        On        N  Naive
11          6       Pre        Y  Naive
12          6        On        Y  Naive
13          7       Pre        Y  Naive
14          7        On        Y  Naive
15          8       Pre        Y Second
16          8        On        Y Second
ADD REPLY
5
Entering edit mode
Simon Anders ★ 3.8k
@simon-anders-3855
Last seen 4.4 years ago
Zentrum für Molekularbiologie, Universi…

Let's start with the easy one: Effect of treatment, accounting for the sample pairing:

~ Patient.ID + Treatment

Here, you fit an individual base line for each patient, so that patient-to-patient differences in expression before treatment are absorbed and you only look at the differences On-Pre.

Very important: You need to put a "factor(...)" around the Patient.ID when creating the data frame. Otherwise, R will consider the patient IDs as numeric value, thinking that Patient 4 is expected to have 4 times the expression of something as Patient 1. 

Also, you may want to use "relevel" to make "Pre" the first and "On" the second factor. (Default is alphabetic ordering.) Otehrwise, all the signs come out the wrong way round.

 

Now, let's add the "Response" factor. You want to see how Treatment effect depends on Response, i.e., see an interaction between the two:

~ Patient.ID + Treatment * Response

Now, you get coefficients for patient base line and for treatment as before, but now also one for "Treatment:Response". The "Treatment" effect is now the effect of Treatment for non-responders ('non' because 'N' is the first level of Response), and "Treatment:Response" is the additional effect that comes on top of that if the patient is a responder.

R will also try to estimate a coefficient for "Response", i.e., base line differences between responders and non-responders. However, this will fail, because you already have individual base lines for each patient, and this absorbs all differences between patients, including those between reponders and non-responders. Therefore, you will get NAs for these ones.

 

If you want genes that react to treatment only in responders, you could find all genes with significant difference for Treatment:Response and significant non-difference (using the 'altHypothesis="lessAbs"' option of DESeq's result function) for Treatment. Or you simply look for significance in Treatment:Response and so get all genes for which reaction to treatment differs according to Responder status.

Plotting the log2 fold changes to treatment for non-responders and responders against each other will also be helpful.

 

Finally, Status: To block for these ones, you could use

~ Patient.ID + Treatment * ( Status + Response )

However,  Status is heavily confounded with Response. All naive patients are non-responders and all pre-treated patients are responders, except for Patient 5. Hence, for any difference you may find in the treatment response between patients 6-8 versus patients 1-4, there is no way of saying whether they are due to Responder or due to pre-treatment Status. This actually a very serious issue, compromising your entire study.

On the other hand: Patient 5 switches pre-treatment status from Pre to On. Are you sure your table is correct?

ADD COMMENT
0
Entering edit mode

We don't fill in NA for the columns of the design matrix that are necessary to remove to make the model matrix full rank, as in lm(). What I do is to point people to the section of the vignette which discusses "Model matrix not full rank" (in fact the error itself instructs them to read this) where there is a section discussing: "Consider an experiment with grouped individuals, where we seek to test the group-specific effect of a treatment, while controlling for individual effects.". This gives a way to fit the model with a model matrix that is full rank.

ADD REPLY
1
Entering edit mode

Michael Love

I know this is a very old thread, but some ppl might want to be able to generate the ind.n column programmatically, in particular when there are many, many samples.

In that case, the following code, which is based on the example from the vignette session "Group-specific condition effects, individuals nested within groups", should do it:

library(magrittr)
library(dplyr)

coldata <- DataFrame(grp=factor(rep(c("X","Y"),each=6)),
                  ind=factor(rep(1:6,each=2)),
                  cnd=factor(rep(c("A","B"),6)))

coldata

as.data.frame(coldata)

coldata %>%
    dplyr::as_tibble() %>%
    dplyr::group_by(grp) %>%
    dplyr::mutate(ind = as.factor(x = ind)) %>% # in case ind is not a factor yet
    dplyr::mutate(ind.n = setNames(object = 1:length(x = unique(x = ind)),
                                   nm = unique(x = ind))[levels(x = ind)[ind]] %>%
                      as.factor()) %>%
    dplyr::ungroup() %>%
    as.data.frame()

EDIT: forgot to make ind.n a factor.

ADD REPLY
0
Entering edit mode

I would like to ask a quick question. I get a bit confused about the paired analysis and interaction/contrasting option. I have a datasets with two timepoints (baseline and 6 weeks) and two groups (placebo and treatment group). at baseline both groups are naive. at 6 weeks, the treatment group is treated for 6 weeks with a drug, the other one is treated with a placebo. If you want to check the differential gene expression regarding the treatment effect after 6 weeks, the design matrix should be: ~ Patient.ID + Treatment. , correct? because both groups are naive at baseline and I have a paired samples

ADD REPLY
0
Entering edit mode

You should plan your statistical analysis with a local statistician or someone familiar with linear models (DESeq2 uses standard linear model coefficients).

I have to restrict my time on the support site to software related questions.

ADD REPLY
0
Entering edit mode
adesrichard ▴ 30
@adesrichard-10969
Last seen 7.7 years ago

Hi Simon,

 

Thank you very much for your quick reply!

I tried to follow you instructions and had no problem with the following design:

~ Patient.ID + Treatment

However, for the two other design, it seems like my model matrix is not full rank and DESEq give me the following error.

df=data.frame(Patient.ID = factor(rep(1:8,each=2)),
                Treatment = factor(rep(c("Pre","On"),8),levels=c("Pre","On")),
              Response = factor(c(rep(("N"),10),rep("Y",6)),levels=c("N","Y")),
                Status = factor(c(rep("Naive",2),rep("Second",4),rep("Naive",8),rep("Second",2)),levels=c("Naive","Second")))

dds=makeExampleDESeqDataSet(n = 1000, m = 16, betaSD = 0, interceptMean = 4,
                        interceptSD = 2, dispMeanRel = function(x) 4/x + 0.1,
                        sizeFactors = rep(1, 16))
colData(dds)=DataFrame(df)
design(dds)=~ Patient.ID + Treatment * (Status + Response )
dds=DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

Do you think there is a a way around this? I tried to build my own matrix using the same design :

mm=model.matrix(~ Patient.ID + Treatment * ( Status + Response ),colData(dds))
mm=mm[,which(colSums(mm)>0)]
dds=DESeq(dds,full = mm)

But I still get the same error. I was wondering whether it is wise to use `ignoreRank = T` when I build my dataset?

Thanks a lot!

Best,

Alex

ADD COMMENT
2
Entering edit mode

You should upgrade to the latest version of DESeq2 which is version 12. For a few releases now, the "model matrix not full rank" error points people to a specific place in the vignette where you can find the solution.

ADD REPLY
0
Entering edit mode

Hello Michael, 

Thank you for your comment! I am running DESeq2 version 1.12.3. The complete error I am getting is as followed :

using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

  Please read the vignette section 'Model matrix not full rank':

  vignette('DESeq2')

 

Thank you very much for your input!

ADD REPLY
0
Entering edit mode

Hi Mike,

I read the instruction about the interaction term and nested individuals but I am still unclear what design I should use and what parameters to input in the `DESeq()` function. This is mainly because the number of responders and non-responders is different in my design, which is not the case in the DESEq2 vignette.

df=data.frame( Patient.ID = factor(rep(1:9,each=2)),
               Treatment = factor(rep(c("Pre","On"),9),
                                  levels=c("Pre","On")),
               Response = factor(c(rep(("N"),10),rep("Y",8)),
                                 levels=c("N","Y")),
               Status = factor(c(rep("Naive",4),
                                 rep("Second",4),
                                 rep("Naive",8),
                                 rep("Second",2)),
                               levels=c("Naive","Second")))

> df

   Patient.ID Treatment Response Status Patient.n
1           1       Pre        N  Naive         1
2           1        On        N  Naive         1
3           2       Pre        N  Naive         2
4           2        On        N  Naive         2
5           3       Pre        N Second         3
6           3        On        N Second         3
7           4       Pre        N Second         4
8           4        On        N Second         4
9           5       Pre        N  Naive         5
10          5        On        N  Naive         5
11          6       Pre        Y  Naive         1
12          6        On        Y  Naive         1
13          7       Pre        Y  Naive         2
14          7        On        Y  Naive         2
15          8       Pre        Y  Naive         3
16          8        On        Y  Naive         3
17          9       Pre        Y Second         4
18          9        On        Y Second         4


 Is there any special parameters that should be considered before runninf the DESEq() function in order to get the difference between the Response status looking at the Pre/On treamtment delta?

Thank you very much!

ADD REPLY
0
Entering edit mode

Because you have patient.n up to 5 in non-responders, you will have to remove an empty column which will be the patient.n = 5 effect in responders. You can do this by forming the model matrix, removing the column (which will have all zeros) and then supplying the model matrix to the 'full' argument of DESeq(). You can form the model matrix with:

mm <- model.matrix(*insert design here*,*insert data frame here*)

I think the design you want is then ~status + response + response:patient.n + response:treatment.

 

ADD REPLY
0
Entering edit mode

Thank you very much Michael for the guidance!

Does this means that patient 5 will be eliminated from the response analysis? In this case, isn't the choice of which patient is being excluded/included in the analysis be considered? If yes, let's say I have 100 patients instead, keeping the same proportion, permuting the patients that are being excluded in the design and keeping only common DEGs between the permutations might be a valid alternative?

 

Thanks again guys for your precious input!

ADD REPLY
0
Entering edit mode

Sorry, no, I'm not suggesting to remove patient 5. What I was trying to say is that, when you form the model matrix, look at the column names. And then you'll find that one of the columns is just a vector of 0's. The column you want to remove will be named something like "responseY.Patient.n5". You'll need to remove this column before providing the model matrix to 'full' in DESeq(). 

Dealing with columns of 0's in the model matrix is also described in the DESeq2 vignette. See Section 3.12.2 "Levels without samples".

ADD REPLY
0
Entering edit mode
adesrichard ▴ 30
@adesrichard-10969
Last seen 7.7 years ago

Thank you very much for all the help Michael!

After following your instructions, I first tested the design which worked well.

~Response+ Response:Patient.n + Response:Treatment

However, when I add status to the design, I have model that is not full ranked again. I tried to add a nested Status column to the design bit it was unsuccessful :

> df
   Patient.ID Treatment Response Status Patient.n Status.n
1           1       Pre        N  Naive         1        1
2           1        On        N  Naive         1        1
7           4       Pre        N  Naive         4        2
8           4        On        N  Naive         4        2
9           5       Pre        N  Naive         5        3
10          5        On        N  Naive         5        3
11          6       Pre        Y  Naive         1        4
12          6        On        Y  Naive         1        4
13          7       Pre        Y  Naive         2        5
14          7        On        Y  Naive         2        5
3           2       Pre        N Second         2        1
4           2        On        N Second         2        1
5           3       Pre        N Second         3        2
6           3        On        N Second         3        2
15          8       Pre        Y Second         3        3
16          8        On        Y Second         3        3

colData(dds)=DataFrame(df)
design(dds)=~Status +Response+ Response:Status.n + Response:Patient.n + Response:Treatment
mm=model.matrix(~ Status +Response+ Response:Status.n + Response:Patient.n + Response:Treatment,colData(dds))
mm=mm[,colSums(mm)>0]
dds=DESeq(dds,full = mm,betaPrior = F)
Error in checkFullRank(full) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

  Please read the vignette section 'Model matrix not full rank':

  vignette('DESeq2')

Can a second nested factor be thrown in the design in order to take into consideration the fact that before treatment (Treatment.Pre), the baseline of `Status` is different between 2 cohort, like a batch effect?

Thanks again for your precious help!

ADD COMMENT
0
Entering edit mode

Sorry, you are right. I wrote too quickly before. I was responding while sitting at a conference (Bioc2016 though...). I should have pointed out that status is already controlled for when you control for patient effects. You are comparing treatment effects within patient so you don't need to, or in another sense, already have controlled for any differences between status naive and status second samples.

You should just use a design of ~response + response:patient.n + response:treatment.

 

ADD REPLY
0
Entering edit mode

Hi Michael,

Does controlling for patient effects also controls for gender differences in this design? Similar to the asked question, I am also interested in identifying genes that are significantly changing during the treatment (between Pre and On treatment) and may confer poor response (delayed vs early relapse).

Best regards, Saurabh

ADD REPLY
0
Entering edit mode

Controlling for patient effects controls for every characteristic of the patient. You are comparing effects within each individual. You both cannot add patient level additive effects (it will not run due to linear dependence among the columns of the design matrix), nor do you need to.

With the above, you have response specific treatment effects. What you do with them is up to you, and for further questions, I'd recommend collaborating with a local statistician who can help you plan the analysis and interpretation of results.

ADD REPLY
0
Entering edit mode

Thank you very much, Dr. Love! As per my understanding, I am comparing the effect of the treatment within patients but how this difference is leading to the difference in response (early vs late), for this the comparison will be made between patients. This was my reason for asking the above questions.

ADD REPLY
0
Entering edit mode

This question about comparing the within-patient effects across groups of patients has its own section in the vignette.

This thread is quite old, I think the vignette is easier to follow than this thread.

ADD REPLY
0
Entering edit mode
adesrichard ▴ 30
@adesrichard-10969
Last seen 7.7 years ago

That makes sense indeed! 

Thank you very much guys, this solved my problems and answered my questions.

Thank you for your support!

Alex

ADD COMMENT

Login before adding your answer.

Traffic: 819 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6