Design: DESeq2 and continuous variables
2
0
Entering edit mode
sasha • 0
@sasha-11847
Last seen 6.8 years ago

Dear community,

I'm trying to use DESeq2 to test if there is a relation between the expression of any genes and X (a continuous variable). For my design I have 5 persons and samples where taken from those at 3 different times. I think the design would be:

~ Persons + time + X

Is this correct?

Thank you

 

 

deseq2 • 2.9k views
ADD COMMENT
0
Entering edit mode

This depends on what X is. Is it a property of the sample or of the person? What is it?

(Please be specific about the biology when asking such questions. Statistics is not as abstract as people think.)

ADD REPLY
0
Entering edit mode

Hi Simon,

Sorry about that :(

X is BMI (or Body mass index). This is a pregnancy study. We want to answer two things. a. Does the BMI at the beginning of the pregnancy has an influence in the expression of any gene? b.  Do different "weight gain levels" (i.e. kilograms gained) during pregnancy has an influence in the expression of any gene?

Thanks and sorry again

ADD REPLY
2
Entering edit mode
Simon Anders ★ 3.7k
@simon-anders-3855
Last seen 3.7 years ago
Zentrum für Molekularbiologie, Universi…

There are several conceptually different questions you may want to ask:

1. For a given time point, and a given gene: Does the expression of the gene at this time point correlate with BMI?

2. For two time points and a given gene: Does the change in expression from the first to the second time point correlate with the initial BMI?

3. For two time points and a given gene: Does the change in expression from the first to the second time point correlate with the change in BMI?

For 1, you should fit the data for each time point separately. This is because DESeq2 will assume you 15 libraries to be measurement from 15 independent samples. In reality, the 3 expression measurements from a subject are correlated, and neglecting this fact will increase type-I error. Then use ~ X. Do not include Person, because if you fit a coefficient for each person, this will remove all differences between subjects, leaving nothing.

(If you want to use all data at once, you would need a so-called mixed-effect model, which DESeq does not support. The 'duplicateCorrelation' function of limma/voom does allow to account for such repeated-measures correlations and might be an alternative here.)

For 2, you best only include the two relevant time points in the sample and use ~ Person + X:time. This will remove the base-level expression (i.e., expression at the first time point) and leave only the differences in expression between time points.

For 3, do the same, but replace X with the change in BMI.

 

And don't be too surprised if you get nothing. 5 subjects sounds way too few to see anything for such a question.

ADD COMMENT
0
Entering edit mode

Hi Simon,

Thanks a lot for your answer, it helped me to understand things a lot better.

Sasha

ADD REPLY
0
Entering edit mode
sasha • 0
@sasha-11847
Last seen 6.8 years ago

Hi Simon,

Sorry about that :(

X is BMI (or Body mass index). This is a pregnancy study. We want to answer two things. a. Does the BMI at the beginning of the pregnancy has an influence in the expression of any gene? b.  Do different "weight gain levels" (i.e. kilograms gained) during pregnancy has an influence in the expression of any gene?

Thanks and sorry again

ADD COMMENT

Login before adding your answer.

Traffic: 666 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6