Question: how to control age and gender effect in expression data using DESeq2
gravatar for klalit1803
15 days ago by
klalit18030 wrote:

I have a dataset where I want to see effect of a drug on my patients who responded and not responded towards treatment. I collected their blood at three different time point or visit. For each patient I have their age and sex information with me. Now to perform differential expression analysis I used DESeq2 to perform time series analysis as I have collected blood at three different visit. I want to control age and gender effect on my data so I can see interaction between responder group and different time point. Here is the sample table and my DESeq2 design formula:

sample     Phenotype     visit     Age     Gender

1             NonResponder 1      42        female
2             NonResponder  2      42        female
3            NonResponder   3      42        female
4            NonResponder   1      49       female
5           NonResponder    2     49        female
6           NonResponder   3     49       female
7          NonResponder    1     27       male
8          NonResponder     2     27       male
9         NonResponder      3     27      male
10        Responder         1       77      female
11       Responder          2      77      female
12       Responder         3       77       female
13       Responder         1       51      male
14      Responder         2       51       male
15      Responder        3       51        male
16      Responder        1       47        male
17      Responder        2       47        male
18      Responder       3        47        male

So which design should I use to control age and gender effect on my data

design 1:

dds=(design= ~age+gender+visit+phenotype+visit:phenotype+age:phenotype+gender:phenotype)

design 2:

dds=DESeq(dds,test="LRT", reduced=~age+gender)

I will highly appreciate help with this



ADD COMMENTlink modified 13 days ago by jennietodd1410 • written 15 days ago by klalit18030

Dear Dr. Michael Love,

Thank you so much for the reply. Actually, I break the age in two part. I found median age in  my data and used age as factor variable to define age as more than median and below median. So now in my dataset age has two group more than median age and less than median age. I also saw your previous reply to other post and used age in this way. My design has age as you suggested. 

However, I could not understand that which design I should use. So you are suggesting me to use second design

dds=DESeq(dds,test="LRT", reduced=~age+gender)

as this will control the age and gender variable while I am testing the effect of phenotype and visit interaction. But I have a question that if I use second design than I will find genes where visit or phenotype have any effect but what are the chances that these genes will not be affected by age and gender?

Or do you think that using this second design I have already controlled the age and gender effect on my data.

Best Regards,




ADD REPLYlink written 13 days ago by klalit18030

It controls for age and gender.

ADD REPLYlink written 13 days ago by Michael Love16k

Thank you so much for reply.

Many thanks,


ADD REPLYlink written 13 days ago by klalit18030
gravatar for Michael Love
14 days ago by
Michael Love16k
United States
Michael Love16k wrote:

The second design is what it typically done to "control" for certain variables while testing the effect of others. 

That will find any genes where visit or phenotype have any effect, including only an interaction.

I don't like to add "age" alone to the design as it implies that log gene expression increases linearly with years of life. For simple modeling, where users aren't familiar with more advanced approaches, I'd recommend to break age into 3-5 groups using cut() and use this factor variable instead. Here you probably only have enough samples cut 3 groups of age.

ADD COMMENTlink modified 14 days ago • written 14 days ago by Michael Love16k

Would using paste() to combine the visit and age and phenotype factors in one with 36 levels be another option?

ADD REPLYlink written 13 days ago by tkapell0

Dear Dr. Michael,

 As I discussed with you I used this design for diff. expression analysis. 

design (a)
dds=DESeq(dds,test="LRT", reduced=~age+gender)

However, I have one question that should I use gender and age interaction with phenotype in design and then use reduce formula e.g.

design (b)

dds=(design= ~age+gender+visit+phenotype+visit:phenotype+age:phenotype+gender:phenotype)
dds=DESeq(dds,test="LRT", reduced=~age+gender)


design (c)

dds=(design= ~age+gender+visit+phenotype+visit:phenotype+age:phenotype+gender:phenotype)
dds=DESeq(dds,test="LRT", reduced=~age+gender+visit+phenotype)

In this experiment i am interested to see the gene expression difference between phenotype across different visit and I also want to correct my data for age and gender. 

so can you please suggest me that which design I should follow and the reason behind this.

Best Regards,





ADD REPLYlink written 8 days ago by klalit18030

Dear Lalit,

At this point, you should really consult a statistician about your analysis plan and how to interpret various designs. There's nothing specific about DESeq2 here, these are various linear models that are the same even if you were doing a simple linear regression. The problem is that there are nearly limitless numbers of designs, and your choice of them should be motivated by a discussion with a statistician. I need to reserve my time on the support forum for software related questions, and while I try to help out a bit with pointing people down the right direction, the use of complex designs as you have here requires more time and a face-to-face meeting with a local statistician.

ADD REPLYlink written 8 days ago by Michael Love16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 350 users visited in the last hour