Question

Changes in expression associated with changes in cell count using limma

0

Entering edit mode

p.figuerola.9 • 0

@pfiguerola9-23262

Last seen 4.9 years ago

Hi, first of all I would like to thank you in advance for your help and suggestions.

I'm studying the changes in genetic expression of 70 runners between before and after a mountain competition. The main objective is to study the changes in expression not explained by the cell counts (either because they come from cell types not included in the model, or because they come from free RNA in plasma). We also want to explain the changes in expression associated with each blood cell.

Data

Expression values before and after the race extracted from the peripheral whole blood.
Cell count values of the main cells in the blood (erythrocytes, neutrophils, monocytes, basophils, eosinophils and platelets).
Some possible confounding variables (age, sex, running time).
3 categories (more or less uniform), each category is characterized by the distance of the race (14km, 35 km and 55 km).

Alternatives

1. First approach:

deltaG ~ 1 + deltaErythrocytes + deltaNeutrophils + ... + delta_Platelets

where

deltaG = (GEafter - GEbefore)/GEbefore

deltaCell = (CellAfter - CellBefore)/CellBefore

That is, use as output the change in gene expression relative to the initial expression. And as input the change in blood cell count of each cell. The main problem with using this approach is that the intercept also captures the difference in expression associated with changes in expression in each cell type. Then, we are only studying the changes in expression associated with changes in cell count. And we are not studying the changes in expression associated with cell count in a broader way.

2. Second approach:

G ~ PRE + POST + SubjectID + ErythrocytesCount + NeutrophilsCount + ... + PlateletsCount

Model without intercept. Where PRE (before race) and POST (after race) dummy variables, and the Subject_ID variable. Where (PRE=1 and POST=0 for Before race values and viceversa). Then contrast between PRE and POST. The results of the contrast can be interpreted as the changes in expression between PRE and POST of the expression not explained by the cell count.

The problem with this approach is that I can't study the differences in expression associated with each cell type. This model adjusts a beta for each cell type regardless of whether it is pre- or post-race.

3. Third approach

G ~ PRE + POST + SubjectID + ErythrocytesCount:PRE + Erythrocytes_Count:POST + .....

With this model I could make contrast between PRE-POST (the same as in the second model), and also make contrast between Cell_PRE and Cell_POST. This model could be more accurate because most cell counts are changing significantly between before and after the race and therefore follow different distributions. The only drawback I see is that I am doubling the number of covariates (two per cell type), thus reducing statistical power.

Summing up

My questions are:

Should I discard the first approach?
Between the second and third method, which do you think is more appropriate? Or, do you think there is a more appropriate alternative?
Should I include the confounding variables in the model? What worries me about including such variables is that the statistical power decreases even more considering that in the third model I am already including 15 explanatory variables and I have 140 subjects.

Any comments, observations or advice are very welcome. Thank you very much in advance.

Kind regards,

Pol

limma model selection several covariables pre-post whole blood • 566 views

ADD COMMENT • link 4.9 years ago p.figuerola.9 • 0

score 0 · Answer 1 · 2020-04-08

0

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

Yes, you should discard the first approach.

I don't have time to give you detailed statistical advice but, in general, if you want to adjust for covariates you simply include them in the model and analyse the data as usual.

If you want to test for DE on a cell-type specific basis, this is specialist topic and several papers have been written about it. I don't have a standard approach that I recommend.

ADD COMMENT • link 4.9 years ago Gordon Smyth 52k

0

Entering edit mode

Thank you very much for your recommendation, I will follow the third approach and try to compare it with other methods on testing DE on a cell-type specific basis. Kind regards,

Pol

ADD REPLY • link 4.9 years ago p.figuerola.9 • 0