Question

DESEq2 Paired samples Before and after treatment

3

Entering edit mode

adesrichard ▴ 30

@adesrichard-10969

Last seen 8.6 years ago

Hi everyone,

I would like to use DESeq2 to process RNASeq paired samples (before and on treatment). I read very carefully the different posts and tutorial but I am still uncertain on how to design my experiment formula. Here are my experimental factors :

df=data.frame(Patient.ID = factor(rep(1:8,each=2)),
                Treatment = factor(rep(c("Pre","On"),8),levels=c("Pre","On")),
              Response = factor(c(rep(("N"),10),rep("Y",6)),levels=c("N","Y")),
                Status = factor(c(rep("Naive",2),rep("Second",4),rep("Naive",8),rep("Second",2)),levels=c("Naive","Second")))

I would like to test for the genes that are significantly changing during the treatment (delta between Pre and On treatment) and are specific to responders (Response=Y). I would like to also correct for a batch effect (factor Status) as some patients have previously being treated with another drug.

What would be the appropriate design to perform such and experiment?

Thank you so much for your reply!

Best,

Alex

deseq2 paired samples • 34k views

ADD COMMENT • link updated 22 months ago by Michael Love 43k • written 9.5 years ago by adesrichard ▴ 30

1

Entering edit mode

The table (to make the post easier to read):

> data
   Patient.ID Treatment Response Status
1           1       Pre        N  Naive
2           1        On        N  Naive
3           2       Pre        N  Naive
4           2        On        N  Naive
5           3       Pre        N  Naive
6           3        On        N  Naive
7           4       Pre        N  Naive
8           4        On        N  Naive
9           5       Pre        N  Naive
10          5        On        N Second
11          6       Pre        Y Second
12          6        On        Y Second
13          7       Pre        Y Second
14          7        On        Y Second
15          8       Pre        Y Second
16          8        On        Y Second

ADD REPLY • link 9.5 years ago Simon Anders ★ 3.8k

0

Entering edit mode

Hi Simon,

I modified the code to make up for the mistake you caught up.

Thanks a lot!

ADD REPLY • link 9.5 years ago adesrichard ▴ 30

2

Entering edit mode

Again, to make it easier to follow here is the sample information (using your updated code). You should include this table in your post in future questions to Bioconductor to help people in understanding your experimental design.

> df[order(df$Response),]
   Patient.ID Treatment Response Status
1           1       Pre        N  Naive
2           1        On        N  Naive
3           2       Pre        N Second
4           2        On        N Second
5           3       Pre        N Second
6           3        On        N Second
7           4       Pre        N  Naive
8           4        On        N  Naive
9           5       Pre        N  Naive
10          5        On        N  Naive
11          6       Pre        Y  Naive
12          6        On        Y  Naive
13          7       Pre        Y  Naive
14          7        On        Y  Naive
15          8       Pre        Y Second
16          8        On        Y Second

ADD REPLY • link 9.5 years ago Michael Love 43k

score 5 · Answer 1 · 2016-06-24

Let's start with the easy one: Effect of treatment, accounting for the sample pairing:

~ Patient.ID + Treatment

Here, you fit an individual base line for each patient, so that patient-to-patient differences in expression before treatment are absorbed and you only look at the differences On-Pre.

Very important: You need to put a "factor(...)" around the Patient.ID when creating the data frame. Otherwise, R will consider the patient IDs as numeric value, thinking that Patient 4 is expected to have 4 times the expression of something as Patient 1.

Also, you may want to use "relevel" to make "Pre" the first and "On" the second factor. (Default is alphabetic ordering.) Otehrwise, all the signs come out the wrong way round.

Now, let's add the "Response" factor. You want to see how Treatment effect depends on Response, i.e., see an interaction between the two:

~ Patient.ID + Treatment * Response

Now, you get coefficients for patient base line and for treatment as before, but now also one for "Treatment:Response". The "Treatment" effect is now the effect of Treatment for non-responders ('non' because 'N' is the first level of Response), and "Treatment:Response" is the additional effect that comes on top of that if the patient is a responder.

R will also try to estimate a coefficient for "Response", i.e., base line differences between responders and non-responders. However, this will fail, because you already have individual base lines for each patient, and this absorbs all differences between patients, including those between reponders and non-responders. Therefore, you will get NAs for these ones.

If you want genes that react to treatment only in responders, you could find all genes with significant difference for Treatment:Response and significant non-difference (using the 'altHypothesis="lessAbs"' option of DESeq's result function) for Treatment. Or you simply look for significance in Treatment:Response and so get all genes for which reaction to treatment differs according to Responder status.

Plotting the log2 fold changes to treatment for non-responders and responders against each other will also be helpful.

Finally, Status: To block for these ones, you could use

~ Patient.ID + Treatment * ( Status + Response )

However, Status is heavily confounded with Response. All naive patients are non-responders and all pre-treated patients are responders, except for Patient 5. Hence, for any difference you may find in the treatment response between patients 6-8 versus patients 1-4, there is no way of saying whether they are due to Responder or due to pre-treatment Status. This actually a very serious issue, compromising your entire study.

On the other hand: Patient 5 switches pre-treatment status from Pre to On. Are you sure your table is correct?

score 0 · Answer 2 · 2016-06-24

0

Entering edit mode

adesrichard ▴ 30

@adesrichard-10969

Last seen 8.6 years ago

Hi Simon,

Thank you very much for your quick reply!

I tried to follow you instructions and had no problem with the following design:

~ Patient.ID + Treatment

However, for the two other design, it seems like my model matrix is not full rank and DESEq give me the following error.

df=data.frame(Patient.ID = factor(rep(1:8,each=2)),
                Treatment = factor(rep(c("Pre","On"),8),levels=c("Pre","On")),
              Response = factor(c(rep(("N"),10),rep("Y",6)),levels=c("N","Y")),
                Status = factor(c(rep("Naive",2),rep("Second",4),rep("Naive",8),rep("Second",2)),levels=c("Naive","Second")))

dds=makeExampleDESeqDataSet(n = 1000, m = 16, betaSD = 0, interceptMean = 4,
                        interceptSD = 2, dispMeanRel = function(x) 4/x + 0.1,
                        sizeFactors = rep(1, 16))
colData(dds)=DataFrame(df)
design(dds)=~ Patient.ID + Treatment * (Status + Response )
dds=DESeq(dds)

estimating size factors
estimating dispersions
gene-wise dispersion estimates
Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

Do you think there is a a way around this? I tried to build my own matrix using the same design :

mm=model.matrix(~ Patient.ID + Treatment * ( Status + Response ),colData(dds))
mm=mm[,which(colSums(mm)>0)]
dds=DESeq(dds,full = mm)

But I still get the same error. I was wondering whether it is wise to use `ignoreRank = T` when I build my dataset?

Thanks a lot!

Best,

Alex

ADD COMMENT • link 9.5 years ago adesrichard ▴ 30

2

Entering edit mode

You should upgrade to the latest version of DESeq2 which is version 12. For a few releases now, the "model matrix not full rank" error points people to a specific place in the vignette where you can find the solution.

ADD REPLY • link 9.5 years ago Michael Love 43k

0

Entering edit mode

Hello Michael,

Thank you for your comment! I am running DESeq2 version 1.12.3. The complete error I am getting is as followed :

using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

  Please read the vignette section 'Model matrix not full rank':

  vignette('DESeq2')

Thank you very much for your input!

ADD REPLY • link 9.5 years ago adesrichard ▴ 30

0

Entering edit mode

Hi Mike,

I read the instruction about the interaction term and nested individuals but I am still unclear what design I should use and what parameters to input in the `DESeq()` function. This is mainly because the number of responders and non-responders is different in my design, which is not the case in the DESEq2 vignette.

df=data.frame( Patient.ID = factor(rep(1:9,each=2)),
               Treatment = factor(rep(c("Pre","On"),9),
                                  levels=c("Pre","On")),
               Response = factor(c(rep(("N"),10),rep("Y",8)),
                                 levels=c("N","Y")),
               Status = factor(c(rep("Naive",4),
                                 rep("Second",4),
                                 rep("Naive",8),
                                 rep("Second",2)),
                               levels=c("Naive","Second")))

> df

   Patient.ID Treatment Response Status Patient.n
1           1       Pre        N  Naive         1
2           1        On        N  Naive         1
3           2       Pre        N  Naive         2
4           2        On        N  Naive         2
5           3       Pre        N Second         3
6           3        On        N Second         3
7           4       Pre        N Second         4
8           4        On        N Second         4
9           5       Pre        N  Naive         5
10          5        On        N  Naive         5
11          6       Pre        Y  Naive         1
12          6        On        Y  Naive         1
13          7       Pre        Y  Naive         2
14          7        On        Y  Naive         2
15          8       Pre        Y  Naive         3
16          8        On        Y  Naive         3
17          9       Pre        Y Second         4
18          9        On        Y Second         4

Is there any special parameters that should be considered before runninf the DESEq() function in order to get the difference between the Response status looking at the Pre/On treamtment delta?

Thank you very much!

ADD REPLY • link 9.5 years ago adesrichard ▴ 30

0

Entering edit mode

Because you have patient.n up to 5 in non-responders, you will have to remove an empty column which will be the patient.n = 5 effect in responders. You can do this by forming the model matrix, removing the column (which will have all zeros) and then supplying the model matrix to the 'full' argument of DESeq(). You can form the model matrix with:

mm <- model.matrix(*insert design here*,*insert data frame here*)

I think the design you want is then ~status + response + response:patient.n + response:treatment.

ADD REPLY • link 9.5 years ago Michael Love 43k

0

Entering edit mode

Thank you very much Michael for the guidance!

Does this means that patient 5 will be eliminated from the response analysis? In this case, isn't the choice of which patient is being excluded/included in the analysis be considered? If yes, let's say I have 100 patients instead, keeping the same proportion, permuting the patients that are being excluded in the design and keeping only common DEGs between the permutations might be a valid alternative?

Thanks again guys for your precious input!

ADD REPLY • link 9.5 years ago adesrichard ▴ 30

0

Entering edit mode

Sorry, no, I'm not suggesting to remove patient 5. What I was trying to say is that, when you form the model matrix, look at the column names. And then you'll find that one of the columns is just a vector of 0's. The column you want to remove will be named something like "responseY.Patient.n5". You'll need to remove this column before providing the model matrix to 'full' in DESeq().

Dealing with columns of 0's in the model matrix is also described in the DESeq2 vignette. See Section 3.12.2 "Levels without samples".

ADD REPLY • link 9.5 years ago Michael Love 43k

score 0 · Answer 3 · 2016-06-26

0

Entering edit mode

adesrichard ▴ 30

@adesrichard-10969

Last seen 8.6 years ago

Thank you very much for all the help Michael!

After following your instructions, I first tested the design which worked well.

~Response+ Response:Patient.n + Response:Treatment

However, when I add status to the design, I have model that is not full ranked again. I tried to add a nested Status column to the design bit it was unsuccessful :

> df
   Patient.ID Treatment Response Status Patient.n Status.n
1           1       Pre        N  Naive         1        1
2           1        On        N  Naive         1        1
7           4       Pre        N  Naive         4        2
8           4        On        N  Naive         4        2
9           5       Pre        N  Naive         5        3
10          5        On        N  Naive         5        3
11          6       Pre        Y  Naive         1        4
12          6        On        Y  Naive         1        4
13          7       Pre        Y  Naive         2        5
14          7        On        Y  Naive         2        5
3           2       Pre        N Second         2        1
4           2        On        N Second         2        1
5           3       Pre        N Second         3        2
6           3        On        N Second         3        2
15          8       Pre        Y Second         3        3
16          8        On        Y Second         3        3

colData(dds)=DataFrame(df)
design(dds)=~Status +Response+ Response:Status.n + Response:Patient.n + Response:Treatment
mm=model.matrix(~ Status +Response+ Response:Status.n + Response:Patient.n + Response:Treatment,colData(dds))
mm=mm[,colSums(mm)>0]
dds=DESeq(dds,full = mm,betaPrior = F)

Error in checkFullRank(full) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

  Please read the vignette section 'Model matrix not full rank':

  vignette('DESeq2')

Can a second nested factor be thrown in the design in order to take into consideration the fact that before treatment (Treatment.Pre), the baseline of `Status` is different between 2 cohort, like a batch effect?

Thanks again for your precious help!

ADD COMMENT • link 9.5 years ago adesrichard ▴ 30

0

Entering edit mode

Sorry, you are right. I wrote too quickly before. I was responding while sitting at a conference (Bioc2016 though...). I should have pointed out that status is already controlled for when you control for patient effects. You are comparing treatment effects within patient so you don't need to, or in another sense, already have controlled for any differences between status naive and status second samples.

You should just use a design of ~response + response:patient.n + response:treatment.

ADD REPLY • link 9.5 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

Does controlling for patient effects also controls for gender differences in this design? Similar to the asked question, I am also interested in identifying genes that are significantly changing during the treatment (between Pre and On treatment) and may confer poor response (delayed vs early relapse).

Best regards, Saurabh

ADD REPLY • link 3.8 years ago Saurabh • 0

0

Entering edit mode

Controlling for patient effects controls for every characteristic of the patient. You are comparing effects within each individual. You both cannot add patient level additive effects (it will not run due to linear dependence among the columns of the design matrix), nor do you need to.

With the above, you have response specific treatment effects. What you do with them is up to you, and for further questions, I'd recommend collaborating with a local statistician who can help you plan the analysis and interpretation of results.

ADD REPLY • link 3.8 years ago Michael Love 43k

0

Entering edit mode

Thank you very much, Dr. Love! As per my understanding, I am comparing the effect of the treatment within patients but how this difference is leading to the difference in response (early vs late), for this the comparison will be made between patients. This was my reason for asking the above questions.

ADD REPLY • link 3.8 years ago Saurabh • 0

0

Entering edit mode

This question about comparing the within-patient effects across groups of patients has its own section in the vignette.

This thread is quite old, I think the vignette is easier to follow than this thread.

ADD REPLY • link 3.8 years ago Michael Love 43k

score 0 · Answer 4 · 2016-06-27

0

Entering edit mode

adesrichard ▴ 30

@adesrichard-10969

Last seen 8.6 years ago

That makes sense indeed!

Thank you very much guys, this solved my problems and answered my questions.

Thank you for your support!

Alex

ADD COMMENT • link 9.5 years ago adesrichard ▴ 30