Question: Effect of inclusion of patientID in the formula
gravatar for o.giannakopoulou
2.2 years ago by
o.giannakopoulou0 wrote:


I am currently using DESeq2 to run some test on a case/control RNA-seq dataset with 2 time-point measurements (i.e. each sample has 2 measurements and is either classified as case or control). Cases and controls are not matched/paired.

I have read the tutorial and any other relevant info I could find but I can’t quite understand what is the role of the PatientID in the design formula. Most importantly I can’t decide whether I do need to add this in my design when I am looking for differentially expressed genes between cases and controls or between the 2 time points. Is this PatientID term essential in the formula as a flag for the presence of the same sample more than once (different time point) or does this term correct for the inter-individual variability in the model? I would really appreciate your help on this.

My question is more general but I’m also including a snapshot of the colData(dds) for your inconvenience:

  Phenotype Visit PatientID
  <factor> <factor> <factor>
Patient1FU Control second 1
Patient1R Control first 1
Patient2FU Control second 2
Patient2R Control first 2
... ... ... ...
Patient40FU Case second 1
Patient40R Case first 1
Patient99FU Case second 2
Patient99R Case first 2

Thank you,







deseq2 • 329 views
ADD COMMENTlink modified 2.2 years ago by James W. MacDonald51k • written 2.2 years ago by o.giannakopoulou0
Answer: Effect of inclusion of patientID in the formula
gravatar for James W. MacDonald
2.2 years ago by
United States
James W. MacDonald51k wrote:

One way to think about this is in the context of a paired t-test. In a 'regular' t-test you compute the mean of each group and then compare the means to see if you think there is an underlying population difference between the two groups. But if the two groups are somehow related (say you measured people's weights before and after they took a diet pill or placebo), then it would be more accurate to use a paired t-test, where you compute the difference in weight for each subject before and after the treatment, and then test to see if there is a consistent difference.

This 'adjusts out' any subject-specific weights, and allows you to test for the thing you really care about (e.g., the change in weight after taking the diet pill or placebo). Algebraically, when you put a patient-level factor in a model, you are doing essentially the same thing; you 'adjust out' the patient-specific gene expression and are then left with just the difference in gene expression between the two visits (or whatever), which is what you really care about.

ADD COMMENTlink written 2.2 years ago by James W. MacDonald51k

Thank you James for your reply. The example of t-test helped me a lot getting a better idea of the effect of PatientID inclusion in the model. So it's now clear how I should compare the expression of each group (cases/controls) between the two visits. 

However, I still have some doubts about the model I should use to look differential expression of cases and controls, controlling for the differences in the expression between the two visits design, i.e. design(dds)<-formula(~Visit+Phenotype). In this case the comparison is between "unrelated" groups, even if in the colData(dds) are included samples from same patient. So in this example would you include the information of PatientID in the model or not?

Thank you again. Your help is much appreciated.

ADD REPLYlink written 2.2 years ago by o.giannakopoulou0

So you have cases and controls, and each patient had two visits, right? In that situation you usually want to know about the interaction (e.g., you want to know if the differences between visit 1 and visit 2 are the same for cases and controls or not). If this is correct, then there is quite a bit of exposition in the DESeq2 vignette that covers the situation.

ADD REPLYlink written 2.2 years ago by James W. MacDonald51k

Thank you again, I really appreciate the help. I have gone through the vignette but I was not sure which model was the most appropriate for the analysis I had in my mind. I'll check DESeq2 more targeted now. Thank you again 

ADD REPLYlink written 2.2 years ago by o.giannakopoulou0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 450 users visited in the last hour