Question

Differential gene expression associated with change in blood parameter following diet

1

Entering edit mode

anna.cot.anna.cot ▴ 30

@annacotannacot-20795

Last seen 4 months ago

United States

Hi,

I am trying figure out how can I answer the question which genes associate with change in the blood concentration of the metabolite of interest following the diet. I have paired samples (before and after the diet), I need to adjust for age, sex and possibly body mass index.

That is an example of my data organization

Sample  ID  Timepoint   Metabolite  change_metabolite   Sex Age BMI
1a  1   1   10,3000 0,0000  1   56,5    38,1
1b  1   2   20,1000 9,8000  1   57,7    27,2
2a  2   1   11,0000 0,0000  2   21  44
2b  2   2   28,7000 17,7000 2   22,2    25,8
3a  3   1   12,4000 0,0000  1   30  33,1
3b  3   2   65,8000 53,4000 1   31,3    30
4a  4   1   112,0000    0,0000  1   67  31,5
4b  4   2   100,7000    -11,3000    1   68  29,7
5a  5   1   36,2000 0,0000  1   53,5    36,8
5b  5   2   89,1000 52,9000 1   54,5    32,9
6a  6   1   12,9000 0,0000  2   25,7    40,4
6b  6   2   29,0000 16,1000 2   26,7    37,6
7a  7   1   15,1000 0,0000  2   44,8    35,7
7b  7   2   98,2000 83,1000 2   45,9    23,1
8a  8   1   8,0000  0,0000  1   25,4    29,9
8b  8   2   11,5600 3,5600  1   26,6    24,8

On top of that my data are coming from 3 different health_centers, so I need to adjust for that too.

How can I do it with edgeR? Does it make any sense?

y <- DGEList(counts=expr)
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y)
design <- model.matrix(~ ID+ Timepoint+ Sex + Age + BMI + health_center+ change_metabolite) 
yf <- estimateDisp(y, as.data.frame(design), robust=TRUE)
fit <- glmQLFit(yf, design)
qlf <- glmQLFTest(fit, coef="change_metabolite")
topTags(qlf)

Thanks for your suggestions!

edger • 875 views

ADD COMMENT • link updated 4.6 years ago by Gordon Smyth 52k • written 4.6 years ago by anna.cot.anna.cot ▴ 30

score 3 · Answer 1 · 2020-09-16

You say you have paired samples (before and after diet). The whole purpose of pairing is to control for factors such as age, sex, BMI and health center, so you do not need to do a paired analysis and add all those factors to the model as well. That would just be doubling up.

If the paired samples have been constructed properly, then you just need:

design <- model.matrix( ~ ID + Timepoint)

So it is all much simpler than what you're currently doing. This is the standard format for paired experiments.

You cannot include change_metabolite in the model. Including patient-specific variables in the model is incompatible with a paired analysis.

Even if you included Metabolite in a completely different non-paired analysis, the Metabolite concentration would need to be a log-scale. Taking differences of unlogged Metabolite concentrates (as you have to get the change variable) is not a meaningful thing to do.

We have not tested edgeR on metabolic data but the QL pipeline will probably work ok.