Question

Which factors to include in design matrix and how to do it in a challenging experimental design

0

Entering edit mode

Barista • 0

@e51ccd8d

Last seen 3.5 years ago

Netherlands

Dear all,

I would like to have some advice regarding the inclusion of certain covariates and factors in my design matrix from a human transcriptomics project. To give some more context we are comparing two different conditions with each other in a retrospective study with around 250 patients in each group. A total of around 500 whole blood samples were sequenced using Lexogen's Quantseq technology. Now we are trying to run a DE-analysis and I have come to the point that I have to decide which factors and covariates should be included in the design matrix. We have the baseline data of all the patients with age, sex, comorbidites, lab values etc and in a post on Biostars we discussed that I should propably run a PCA plot taking into account the factors and covariates to see if those are the variables that split the data. If so, then I should include them in the design matrix. (see link: https://www.biostars.org/p/9494249/#9497484)

Furthermore, as it is a bulk RNAseq experiment, I think we should also correct for composition of the tissue that is sequenced. So that means we should correct for cell differentiation (number of basophils, neutrophils, lymfocyts etc.). As we discussed on Biostars the suggestion would be to include the cell counts as a random effect in the EdgeR model (which I used for DE analysis), but we were not sure how to do this. Does anyone have a suggestion how to include these variables in a proper way in EdgeR?

Furthermore, if anyone has complementary advice on which variables to include and why, or a method on how to select them, please feel free to share it with me!

RNAseq Design Matrix • 1.7k views

ADD COMMENT • link 4.2 years ago Barista • 0

score 1 · Answer 1 · 2021-11-12

1

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 1 hour ago

WEHI, Melbourne, Australia

Cell counts is a continuous variable and continous variables cannot be random effects. This is not specific to edgeR but rather a general principle that continous variables and random effects are not the same thing.

If you want to include cell counts in the model, then the correct method is to include them as covariates in the design matrix, same as you include any other covariate.

ADD COMMENT • link 4.2 years ago Gordon Smyth 53k

0

Entering edit mode

Allright, point taken! Thanks for your advice!

Any other thoughts about the experimental design and which factors/covariates to include in the first place?

ADD REPLY • link 4.2 years ago Barista • 0