Hi, I am examining wether there is a difference in gene expression between patients with a biomarker below and above the median. Lets call that variable "group". I want to examine the DGE of patients below and above the median group-levels in different subgroups: Sex, Diabetes-status, Age (Below and above 60 years).
However, when investigating Diabetes status for example I get different results for my DGE analysis with the following model.matrices:
model.matrix( ~ 0 + group : Diabetes)
model.matrix( ~ 0 + group : Diabetes + Sex + Agecutat60)
I wanted to use that second model.matrix because then I wouldnt have to estimate the dispersion for every single subgroup with this code:
y <- DGEList(GenewiseCounts,
group = group,
genes = GenewiseCounts[, 1, drop=FALSE]
)
y <- estimateGLMCommonDisp(y, design, verbose=TRUE)
y <- estimateGLMTagwiseDisp(y, design)
However, I dont necessarily want to "adjust" for those other variables but rather want to keep them in the design matrix in order to change the contrasts later on. I am unsure about those differing results with the different designs of the modelmatrix. Is it that wrong to keep them in my model.matrix?