Question

Effect of inclusion of patientID in the formula

0

Entering edit mode

o.giannakopoulou ▴ 10

@ogiannakopoulou-13991

Last seen 4.8 years ago

Hello,

I am currently using DESeq2 to run some test on a case/control RNA-seq dataset with 2 time-point measurements (i.e. each sample has 2 measurements and is either classified as case or control). Cases and controls are not matched/paired.

I have read the tutorial and any other relevant info I could find but I can’t quite understand what is the role of the PatientID in the design formula. Most importantly I can’t decide whether I do need to add this in my design when I am looking for differentially expressed genes between cases and controls or between the 2 time points. Is this PatientID term essential in the formula as a flag for the presence of the same sample more than once (different time point) or does this term correct for the inter-individual variability in the model? I would really appreciate your help on this.

My question is more general but I’m also including a snapshot of the colData(dds) for your inconvenience:

	Phenotype	Visit	PatientID
	<factor>	<factor>	<factor>
Patient1FU	Control	second	1
Patient1R	Control	first	1
Patient2FU	Control	second	2
Patient2R	Control	first	2
...	...	...	...
Patient40FU	Case	second	1
Patient40R	Case	first	1
Patient99FU	Case	second	2
Patient99R	Case	first	2

Thank you,

Olga

deseq2 • 930 views

ADD COMMENT • link updated 7.2 years ago by James W. MacDonald 67k • written 7.2 years ago by o.giannakopoulou ▴ 10

score 1 · Answer 1 · 2017-09-19

1

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 38 minutes ago

United States

One way to think about this is in the context of a paired t-test. In a 'regular' t-test you compute the mean of each group and then compare the means to see if you think there is an underlying population difference between the two groups. But if the two groups are somehow related (say you measured people's weights before and after they took a diet pill or placebo), then it would be more accurate to use a paired t-test, where you compute the difference in weight for each subject before and after the treatment, and then test to see if there is a consistent difference.

This 'adjusts out' any subject-specific weights, and allows you to test for the thing you really care about (e.g., the change in weight after taking the diet pill or placebo). Algebraically, when you put a patient-level factor in a model, you are doing essentially the same thing; you 'adjust out' the patient-specific gene expression and are then left with just the difference in gene expression between the two visits (or whatever), which is what you really care about.

ADD COMMENT • link 7.2 years ago James W. MacDonald 67k

0

Entering edit mode

Thank you James for your reply. The example of t-test helped me a lot getting a better idea of the effect of PatientID inclusion in the model. So it's now clear how I should compare the expression of each group (cases/controls) between the two visits.

However, I still have some doubts about the model I should use to look differential expression of cases and controls, controlling for the differences in the expression between the two visits design, i.e. design(dds)<-formula(~Visit+Phenotype). In this case the comparison is between "unrelated" groups, even if in the colData(dds) are included samples from same patient. So in this example would you include the information of PatientID in the model or not?

Thank you again. Your help is much appreciated.

ADD REPLY • link 7.2 years ago o.giannakopoulou ▴ 10

0

Entering edit mode

So you have cases and controls, and each patient had two visits, right? In that situation you usually want to know about the interaction (e.g., you want to know if the differences between visit 1 and visit 2 are the same for cases and controls or not). If this is correct, then there is quite a bit of exposition in the DESeq2 vignette that covers the situation.

ADD REPLY • link 7.2 years ago James W. MacDonald 67k

0

Entering edit mode

Thank you again, I really appreciate the help. I have gone through the vignette but I was not sure which model was the most appropriate for the analysis I had in my mind. I'll check DESeq2 more targeted now. Thank you again

ADD REPLY • link 7.2 years ago o.giannakopoulou ▴ 10