**0**wrote:

Hello, I'm attempting to run a glm analysis with edgeR on 32 biologically independent samples, with 3 factors. I have sex, age, and genotype. I'd like to create a full design matrix, but I'm having some trouble doing this, and understanding all the contrasts possible.

> sex = rep(c("m","f"),each=2,times=8) > time = rep(c("p0","p2","p6","p12"),each=8) > geno = factor(rep(c("WT","KO"),times=16)) > geno = relevel(geno,ref="WT") > > full_design = model.matrix(~0+sex*geno*time) > colnames(full_design) [1] "sexf" "sexm" "genoKO" [4] "timep12" "timep2" "timep6" [7] "sexm:genoKO" "sexm:timep12" "sexm:timep2" [10] "sexm:timep6" "genoKO:timep12" "genoKO:timep2" [13] "genoKO:timep6" "sexm:genoKO:timep12" "sexm:genoKO:timep2" [16] "sexm:genoKO:timep6"

From what I understand of the EdgeR manual, the first 6 fields of the design matrix are the coefficients of each factor, against the baseline of each other factor (so "sexf" is "f" at "p0", with a "WT" genotype, "timeP12" is "f" at "p12", with genotype "WT", and so on). Is this correct? And is this design matrix the best way to measure the differences between my samples? Thank you very much for your help.

EDIT: Looking at the design matrix, it seems that my previous understanding was mistaken. The first column, "sexf", is a length 32 vector, with 1s at all rows where the sex is "f" - so, equivalent to `rep(c(0,1),each=2,times=8).`

In this case, EdgeR does have information on all samples that have sex "f", but I'm not sure how, in this case, one would find the difference between "sexf" and "sexm", at p0, since "sexm:timep0" doesn't exist in the "full_design" matrix. Am I correct in my understanding of the design matrix - labeling rows that apply to the column name with 1s? And if so, how does one make all possible comparisons with the "full_design" matrix? It seems like it isn't full at all.