Question

Different results from DGE using different subgroups in model.matrix

0

Entering edit mode

Fabian • 0

@25565e27

Last seen 12 months ago

Austria

Hi, I am examining wether there is a difference in gene expression between patients with a biomarker below and above the median. Lets call that variable "group". I want to examine the DGE of patients below and above the median group-levels in different subgroups: Sex, Diabetes-status, Age (Below and above 60 years).

However, when investigating Diabetes status for example I get different results for my DGE analysis with the following model.matrices:

model.matrix( ~ 0 + group : Diabetes)
model.matrix( ~ 0 + group : Diabetes + Sex + Agecutat60)

I wanted to use that second model.matrix because then I wouldnt have to estimate the dispersion for every single subgroup with this code:

y <- DGEList(GenewiseCounts, 
             group = group, 
             genes = GenewiseCounts[, 1, drop=FALSE]
             )
y <- estimateGLMCommonDisp(y, design, verbose=TRUE)
y <- estimateGLMTagwiseDisp(y, design)

However, I dont necessarily want to "adjust" for those other variables but rather want to keep them in the design matrix in order to change the contrasts later on. I am unsure about those differing results with the different designs of the modelmatrix. Is it that wrong to keep them in my model.matrix?

edgeR • 469 views

ADD COMMENT • link updated 13 months ago by Gordon Smyth 52k • written 13 months ago by Fabian • 0

score 0 · Answer 1 · 2024-03-03

It is hard to tell what you are trying to do. What does your group variable represent? How does it relate to Sex, Diabetes and Age? You refer to subgroups defined by Sex, Diabetes and Age but your analysis does not divide patients into subgroups. Your model formula includes in interaction term but without the corresponding main effects.

Your post seems to show some misunderstandings about how an edgeR analysis is conducted. The design matrix must include all the relevant factors and there is no need to estimate dispersions for subgroups separately. You cannot "adjust" the dispersion estimates for covariates but not do the same for the DE analysis. I suggest you go back to the edgeR User's Guide and try to follow a standard analysis.

If you defined your subgroups and scientific questions more clearly, I think the analysis would be more straightforward that you are currently finding it.