2.9 years ago by
Zentrum für Molekularbiologie, Universität Heidelberg
There are several conceptually different questions you may want to ask:
1. For a given time point, and a given gene: Does the expression of the gene at this time point correlate with BMI?
2. For two time points and a given gene: Does the change in expression from the first to the second time point correlate with the initial BMI?
3. For two time points and a given gene: Does the change in expression from the first to the second time point correlate with the change in BMI?
For 1, you should fit the data for each time point separately. This is because DESeq2 will assume you 15 libraries to be measurement from 15 independent samples. In reality, the 3 expression measurements from a subject are correlated, and neglecting this fact will increase type-I error. Then use
~ X. Do not include
Person, because if you fit a coefficient for each person, this will remove all differences between subjects, leaving nothing.
(If you want to use all data at once, you would need a so-called mixed-effect model, which DESeq does not support. The 'duplicateCorrelation' function of limma/voom does allow to account for such repeated-measures correlations and might be an alternative here.)
For 2, you best only include the two relevant time points in the sample and use
~ Person + X:time. This will remove the base-level expression (i.e., expression at the first time point) and leave only the differences in expression between time points.
For 3, do the same, but replace X with the change in BMI.
And don't be too surprised if you get nothing. 5 subjects sounds way too few to see anything for such a question.