Question: DESEq2 Paired samples Before and after treatment
1
3.2 years ago by

Hi everyone,

I would like to use DESeq2 to process RNASeq paired samples (before and on treatment). I read very carefully the different posts and tutorial but I am still uncertain on how to design my experiment formula. Here are my experimental factors :

df=data.frame(Patient.ID = factor(rep(1:8,each=2)),
Treatment = factor(rep(c("Pre","On"),8),levels=c("Pre","On")),
Response = factor(c(rep(("N"),10),rep("Y",6)),levels=c("N","Y")),
Status = factor(c(rep("Naive",2),rep("Second",4),rep("Naive",8),rep("Second",2)),levels=c("Naive","Second")))


I would like to test for the genes that are significantly changing during the treatment (delta between Pre and On treatment) and are specific to responders (Response=Y). I would like to also correct for a batch effect (factor Status) as some patients  have previously being treated with another drug.

What would be the appropriate design to perform such and experiment?

Best,

Alex

deseq2 paired samples • 6.5k views
modified 3.2 years ago • written 3.2 years ago by adesrichard10
The table (to make the post easier to read):

> data
Patient.ID Treatment Response Status
1           1       Pre        N  Naive
2           1        On        N  Naive
3           2       Pre        N  Naive
4           2        On        N  Naive
5           3       Pre        N  Naive
6           3        On        N  Naive
7           4       Pre        N  Naive
8           4        On        N  Naive
9           5       Pre        N  Naive
10          5        On        N Second
11          6       Pre        Y Second
12          6        On        Y Second
13          7       Pre        Y Second
14          7        On        Y Second
15          8       Pre        Y Second
16          8        On        Y Second

Hi Simon,

I modified the code to make up for the  mistake you caught up.

Thanks a lot!

1

Again, to make it easier to follow here is the sample information (using your updated code). You should include this table in your post in future questions to Bioconductor to help people in understanding your experimental design.

> df[order(df\$Response),]
Patient.ID Treatment Response Status
1           1       Pre        N  Naive
2           1        On        N  Naive
3           2       Pre        N Second
4           2        On        N Second
5           3       Pre        N Second
6           3        On        N Second
7           4       Pre        N  Naive
8           4        On        N  Naive
9           5       Pre        N  Naive
10          5        On        N  Naive
11          6       Pre        Y  Naive
12          6        On        Y  Naive
13          7       Pre        Y  Naive
14          7        On        Y  Naive
15          8       Pre        Y Second
16          8        On        Y Second
Answer: DESEq2 Paired samples Before and after treatment
0
3.2 years ago by
Simon Anders3.6k
Zentrum für Molekularbiologie, Universität Heidelberg
Simon Anders3.6k wrote:

Let's start with the easy one: Effect of treatment, accounting for the sample pairing:

~ Patient.ID + Treatment

Here, you fit an individual base line for each patient, so that patient-to-patient differences in expression before treatment are absorbed and you only look at the differences On-Pre.

Very important: You need to put a "factor(...)" around the Patient.ID when creating the data frame. Otherwise, R will consider the patient IDs as numeric value, thinking that Patient 4 is expected to have 4 times the expression of something as Patient 1.

Also, you may want to use "relevel" to make "Pre" the first and "On" the second factor. (Default is alphabetic ordering.) Otehrwise, all the signs come out the wrong way round.

Now, let's add the "Response" factor. You want to see how Treatment effect depends on Response, i.e., see an interaction between the two:

~ Patient.ID + Treatment * Response

Now, you get coefficients for patient base line and for treatment as before, but now also one for "Treatment:Response". The "Treatment" effect is now the effect of Treatment for non-responders ('non' because 'N' is the first level of Response), and "Treatment:Response" is the additional effect that comes on top of that if the patient is a responder.

R will also try to estimate a coefficient for "Response", i.e., base line differences between responders and non-responders. However, this will fail, because you already have individual base lines for each patient, and this absorbs all differences between patients, including those between reponders and non-responders. Therefore, you will get NAs for these ones.

If you want genes that react to treatment only in responders, you could find all genes with significant difference for Treatment:Response and significant non-difference (using the 'altHypothesis="lessAbs"' option of DESeq's result function) for Treatment. Or you simply look for significance in Treatment:Response and so get all genes for which reaction to treatment differs according to Responder status.

Plotting the log2 fold changes to treatment for non-responders and responders against each other will also be helpful.

Finally, Status: To block for these ones, you could use

~ Patient.ID + Treatment * ( Status + Response )

However,  Status is heavily confounded with Response. All naive patients are non-responders and all pre-treated patients are responders, except for Patient 5. Hence, for any difference you may find in the treatment response between patients 6-8 versus patients 1-4, there is no way of saying whether they are due to Responder or due to pre-treatment Status. This actually a very serious issue, compromising your entire study.

On the other hand: Patient 5 switches pre-treatment status from Pre to On. Are you sure your table is correct?

We don't fill in NA for the columns of the design matrix that are necessary to remove to make the model matrix full rank, as in lm(). What I do is to point people to the section of the vignette which discusses "Model matrix not full rank" (in fact the error itself instructs them to read this) where there is a section discussing: "Consider an experiment with grouped individuals, where we seek to test the group-specific effect of a treatment, while controlling for individual effects.". This gives a way to fit the model with a model matrix that is full rank.

Answer: DESEq2 Paired samples Before and after treatment
0
3.2 years ago by

Hi Simon,

I tried to follow you instructions and had no problem with the following design:

~ Patient.ID + Treatment

However, for the two other design, it seems like my model matrix is not full rank and DESEq give me the following error.

df=data.frame(Patient.ID = factor(rep(1:8,each=2)),
Treatment = factor(rep(c("Pre","On"),8),levels=c("Pre","On")),
Response = factor(c(rep(("N"),10),rep("Y",6)),levels=c("N","Y")),
Status = factor(c(rep("Naive",2),rep("Second",4),rep("Naive",8),rep("Second",2)),levels=c("Naive","Second")))

dds=makeExampleDESeqDataSet(n = 1000, m = 16, betaSD = 0, interceptMean = 4,
interceptSD = 2, dispMeanRel = function(x) 4/x + 0.1,
sizeFactors = rep(1, 16))
colData(dds)=DataFrame(df)
design(dds)=~ Patient.ID + Treatment * (Status + Response )
dds=DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.

Do you think there is a a way around this? I tried to build my own matrix using the same design :

mm=model.matrix(~ Patient.ID + Treatment * ( Status + Response ),colData(dds))
mm=mm[,which(colSums(mm)>0)]
dds=DESeq(dds,full = mm)

But I still get the same error. I was wondering whether it is wise to use ignoreRank = T when I build my dataset?

Thanks a lot!

Best,

Alex

1

You should upgrade to the latest version of DESeq2 which is version 12. For a few releases now, the "model matrix not full rank" error points people to a specific place in the vignette where you can find the solution.

Hello Michael,

Thank you for your comment! I am running DESeq2 version 1.12.3. The complete error I am getting is as followed :

using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.

vignette('DESeq2')

Thank you very much for your input!

Hi Mike,

I read the instruction about the interaction term and nested individuals but I am still unclear what design I should use and what parameters to input in the DESeq() function. This is mainly because the number of responders and non-responders is different in my design, which is not the case in the DESEq2 vignette.

df=data.frame( Patient.ID = factor(rep(1:9,each=2)),
Treatment = factor(rep(c("Pre","On"),9),
levels=c("Pre","On")),
Response = factor(c(rep(("N"),10),rep("Y",8)),
levels=c("N","Y")),
Status = factor(c(rep("Naive",4),
rep("Second",4),
rep("Naive",8),
rep("Second",2)),
levels=c("Naive","Second")))

> df


   Patient.ID Treatment Response Status Patient.n
1           1       Pre        N  Naive         1
2           1        On        N  Naive         1
3           2       Pre        N  Naive         2
4           2        On        N  Naive         2
5           3       Pre        N Second         3
6           3        On        N Second         3
7           4       Pre        N Second         4
8           4        On        N Second         4
9           5       Pre        N  Naive         5
10          5        On        N  Naive         5
11          6       Pre        Y  Naive         1
12          6        On        Y  Naive         1
13          7       Pre        Y  Naive         2
14          7        On        Y  Naive         2
15          8       Pre        Y  Naive         3
16          8        On        Y  Naive         3
17          9       Pre        Y Second         4
18          9        On        Y Second         4

Is there any special parameters that should be considered before runninf the DESEq() function in order to get the difference between the Response status looking at the Pre/On treamtment delta?

Thank you very much!

Because you have patient.n up to 5 in non-responders, you will have to remove an empty column which will be the patient.n = 5 effect in responders. You can do this by forming the model matrix, removing the column (which will have all zeros) and then supplying the model matrix to the 'full' argument of DESeq(). You can form the model matrix with:

mm <- model.matrix(*insert design here*,*insert data frame here*)

I think the design you want is then ~status + response + response:patient.n + response:treatment.

Thank you very much Michael for the guidance!

Does this means that patient 5 will be eliminated from the response analysis? In this case, isn't the choice of which patient is being excluded/included in the analysis be considered? If yes, let's say I have 100 patients instead, keeping the same proportion, permuting the patients that are being excluded in the design and keeping only common DEGs between the permutations might be a valid alternative?

Thanks again guys for your precious input!

Sorry, no, I'm not suggesting to remove patient 5. What I was trying to say is that, when you form the model matrix, look at the column names. And then you'll find that one of the columns is just a vector of 0's. The column you want to remove will be named something like "responseY.Patient.n5". You'll need to remove this column before providing the model matrix to 'full' in DESeq().

Dealing with columns of 0's in the model matrix is also described in the DESeq2 vignette. See Section 3.12.2 "Levels without samples".

Answer: DESEq2 Paired samples Before and after treatment
0
3.2 years ago by

Thank you very much for all the help Michael!

After following your instructions, I first tested the design which worked well.

~Response+ Response:Patient.n + Response:Treatment

However, when I add status to the design, I have model that is not full ranked again. I tried to add a nested Status column to the design bit it was unsuccessful :

> df
Patient.ID Treatment Response Status Patient.n Status.n
1           1       Pre        N  Naive         1        1
2           1        On        N  Naive         1        1
7           4       Pre        N  Naive         4        2
8           4        On        N  Naive         4        2
9           5       Pre        N  Naive         5        3
10          5        On        N  Naive         5        3
11          6       Pre        Y  Naive         1        4
12          6        On        Y  Naive         1        4
13          7       Pre        Y  Naive         2        5
14          7        On        Y  Naive         2        5
3           2       Pre        N Second         2        1
4           2        On        N Second         2        1
5           3       Pre        N Second         3        2
6           3        On        N Second         3        2
15          8       Pre        Y Second         3        3
16          8        On        Y Second         3        3

colData(dds)=DataFrame(df)
design(dds)=~Status +Response+ Response:Status.n + Response:Patient.n + Response:Treatment
mm=model.matrix(~ Status +Response+ Response:Status.n + Response:Patient.n + Response:Treatment,colData(dds))
mm=mm[,colSums(mm)>0]
dds=DESeq(dds,full = mm,betaPrior = F)
Error in checkFullRank(full) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.

vignette('DESeq2')

Can a second nested factor be thrown in the design in order to take into consideration the fact that before treatment (Treatment.Pre), the baseline of Status is different between 2 cohort, like a batch effect?

Thanks again for your precious help!

Sorry, you are right. I wrote too quickly before. I was responding while sitting at a conference (Bioc2016 though...). I should have pointed out that status is already controlled for when you control for patient effects. You are comparing treatment effects within patient so you don't need to, or in another sense, already have controlled for any differences between status naive and status second samples.

You should just use a design of ~response + response:patient.n + response:treatment.

Answer: DESEq2 Paired samples Before and after treatment
0
3.2 years ago by

That makes sense indeed!

Thank you very much guys, this solved my problems and answered my questions.