Question

Design: DESeq2 and continuous variables

0

Entering edit mode

sasha • 0

@sasha-11847

Last seen 6.8 years ago

Dear community,

I'm trying to use DESeq2 to test if there is a relation between the expression of any genes and X (a continuous variable). For my design I have 5 persons and samples where taken from those at 3 different times. I think the design would be:

~ Persons + time + X

Is this correct?

Thank you

deseq2 • 2.9k views

ADD COMMENT • link updated 7.4 years ago by Simon Anders ★ 3.7k • written 7.4 years ago by sasha • 0

0

Entering edit mode

This depends on what X is. Is it a property of the sample or of the person? What is it?

(Please be specific about the biology when asking such questions. Statistics is not as abstract as people think.)

ADD REPLY • link 7.4 years ago Simon Anders ★ 3.7k

0

Entering edit mode

Hi Simon,

Sorry about that :(

X is BMI (or Body mass index). This is a pregnancy study. We want to answer two things. a. Does the BMI at the beginning of the pregnancy has an influence in the expression of any gene? b. Do different "weight gain levels" (i.e. kilograms gained) during pregnancy has an influence in the expression of any gene?

Thanks and sorry again

ADD REPLY • link 7.4 years ago sasha • 0

0

Entering edit mode

sasha • 0

@sasha-11847

Last seen 6.8 years ago

Hi Simon,

Sorry about that :(

X is BMI (or Body mass index). This is a pregnancy study. We want to answer two things. a. Does the BMI at the beginning of the pregnancy has an influence in the expression of any gene? b. Do different "weight gain levels" (i.e. kilograms gained) during pregnancy has an influence in the expression of any gene?

Thanks and sorry again

ADD COMMENT • link 7.4 years ago sasha • 0

score 2 · Accepted Answer · 2016-11-15

There are several conceptually different questions you may want to ask:

1. For a given time point, and a given gene: Does the expression of the gene at this time point correlate with BMI?

2. For two time points and a given gene: Does the change in expression from the first to the second time point correlate with the initial BMI?

3. For two time points and a given gene: Does the change in expression from the first to the second time point correlate with the change in BMI?

For 1, you should fit the data for each time point separately. This is because DESeq2 will assume you 15 libraries to be measurement from 15 independent samples. In reality, the 3 expression measurements from a subject are correlated, and neglecting this fact will increase type-I error. Then use ~ X. Do not include Person, because if you fit a coefficient for each person, this will remove all differences between subjects, leaving nothing.

(If you want to use all data at once, you would need a so-called mixed-effect model, which DESeq does not support. The 'duplicateCorrelation' function of limma/voom does allow to account for such repeated-measures correlations and might be an alternative here.)

For 2, you best only include the two relevant time points in the sample and use ~ Person + X:time. This will remove the base-level expression (i.e., expression at the first time point) and leave only the differences in expression between time points.

For 3, do the same, but replace X with the change in BMI.

And don't be too surprised if you get nothing. 5 subjects sounds way too few to see anything for such a question.