I am trying to set up model where I have paired biological replicates as a blocking factor, and I am interested in the differential expression in individual strains as well as between specific groups of strains that were isolated from similar environments. I am getting the Matrix not of full rank error even though none of the columns in my design matrix are all 0s. Any advice on how to resolve this problem?
After reading in my data, I made the following variables:
rep <- factor(c(1,2,1,2,1,2,1,2,1,2,1,2))
group <- factor(c(1,1,1,1,2,2,2,2,3,3,3,3))
strain <- factor(c(1,1,2,2,3,3,4,4,5,5,6,6))
I then created by design:
design <- model.matrix(~rep+group+strain, data=data)
which looks like this:
> design
(Intercept) rep2 group2 group3 strain2 strain3 strain4 strain5 strain6
1 1 0 0 0 0 0 0 0 0
2 1 1 0 0 0 0 0 0 0
3 1 0 0 0 1 0 0 0 0
4 1 1 0 0 1 0 0 0 0
5 1 0 1 0 0 1 0 0 0
6 1 1 1 0 0 1 0 0 0
7 1 0 1 0 0 0 1 0 0
8 1 1 1 0 0 0 1 0 0
9 1 0 0 1 0 0 0 1 0
10 1 1 0 1 0 0 0 1 0
11 1 0 0 1 0 0 0 0 1
12 1 1 0 1 0 0 0 0 1
attr(,"assign")
[1] 0 1 2 2 3 3 3 3 3
attr(,"contrasts")
attr(,"contrasts")$rep
[1] "contr.treatment"
attr(,"contrasts")$group
[1] "contr.treatment"
attr(,"contrasts")$strain
[1] "contr.treatment"
I then made my DGEList, normalized, and got an error while trying to estimate Dispersion.
y <- DGEList(counts=data[,1:12], group=group:strain:rep)
y<-calcNormFactors(y)
y <- estimateGLMCommonDisp(y,design)
> y <- estimateGLMCommonDisp(y,design)
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset, :
Design matrix not of full rank. The following coefficients not estimable:
strain4 strain6
I also tried changing the order of variables in the model (~rep+strain+group) which results in the same error with group 2 and group 3 coefficients not estimable.
Thank you, that makes sense. What if I want to do an ANOVA-like test for any difference between the groups? I'm not sure how I would set up the contrasts for that if strain and rep are the only grouping factors.
Just take all of the pairwise contrasts between groups (using the comparison between averages that I described in my original answer) and
cbind
them into a contrast matrix. This will test for differences between any of the group averages in an ANOVA-like style.