Question: Design: DESeq2 and continuous variables
0
2.8 years ago by
sasha0
sasha0 wrote:

Dear community,

I'm trying to use DESeq2 to test if there is a relation between the expression of any genes and X (a continuous variable). For my design I have 5 persons and samples where taken from those at 3 different times. I think the design would be:

~ Persons + time + X

Is this correct?

Thank you

deseq2 • 1.1k views
modified 2.8 years ago by Simon Anders3.6k • written 2.8 years ago by sasha0

This depends on what X is. Is it a property of the sample or of the person? What is it?

(Please be specific about the biology when asking such questions. Statistics is not as abstract as people think.)

Hi Simon,

X is BMI (or Body mass index). This is a pregnancy study. We want to answer two things. a. Does the BMI at the beginning of the pregnancy has an influence in the expression of any gene? b.  Do different "weight gain levels" (i.e. kilograms gained) during pregnancy has an influence in the expression of any gene?

Thanks and sorry again

Answer: Design: DESeq2 and continuous variables
2
2.8 years ago by
Simon Anders3.6k
Zentrum für Molekularbiologie, Universität Heidelberg
Simon Anders3.6k wrote:

There are several conceptually different questions you may want to ask:

1. For a given time point, and a given gene: Does the expression of the gene at this time point correlate with BMI?

2. For two time points and a given gene: Does the change in expression from the first to the second time point correlate with the initial BMI?

3. For two time points and a given gene: Does the change in expression from the first to the second time point correlate with the change in BMI?

For 1, you should fit the data for each time point separately. This is because DESeq2 will assume you 15 libraries to be measurement from 15 independent samples. In reality, the 3 expression measurements from a subject are correlated, and neglecting this fact will increase type-I error. Then use ~ X. Do not include Person, because if you fit a coefficient for each person, this will remove all differences between subjects, leaving nothing.

(If you want to use all data at once, you would need a so-called mixed-effect model, which DESeq does not support. The 'duplicateCorrelation' function of limma/voom does allow to account for such repeated-measures correlations and might be an alternative here.)

For 2, you best only include the two relevant time points in the sample and use ~ Person + X:time. This will remove the base-level expression (i.e., expression at the first time point) and leave only the differences in expression between time points.

For 3, do the same, but replace X with the change in BMI.

And don't be too surprised if you get nothing. 5 subjects sounds way too few to see anything for such a question.

Hi Simon,

Thanks a lot for your answer, it helped me to understand things a lot better.

Sasha

Answer: Design: DESeq2 and continuous variables
0
2.8 years ago by
sasha0
sasha0 wrote:

Hi Simon,

X is BMI (or Body mass index). This is a pregnancy study. We want to answer two things. a. Does the BMI at the beginning of the pregnancy has an influence in the expression of any gene? b.  Do different "weight gain levels" (i.e. kilograms gained) during pregnancy has an influence in the expression of any gene?

Thanks and sorry again