Search
Question: EdgeR Design matrix not of full rank
0
3.0 years ago by
United States
kjclowers0 wrote:

I am trying to set up model where I have paired biological replicates as a blocking factor, and I am interested in the differential expression in individual strains as well as between specific groups of strains that were isolated from similar environments. I am getting the Matrix not of full rank error even though none of the columns in my design matrix are all 0s. Any advice on how to resolve this problem?

rep <- factor(c(1,2,1,2,1,2,1,2,1,2,1,2))

group <- factor(c(1,1,1,1,2,2,2,2,3,3,3,3))

strain <- factor(c(1,1,2,2,3,3,4,4,5,5,6,6))

I then created by design:

design <- model.matrix(~rep+group+strain, data=data)

which looks like this:

> design

(Intercept) rep2 group2 group3 strain2 strain3 strain4 strain5 strain6

1            1    0      0      0       0       0       0       0       0

2            1    1      0      0       0       0       0       0       0

3            1    0      0      0       1       0       0       0       0

4            1    1      0      0       1       0       0       0       0

5            1    0      1      0       0       1       0       0       0

6            1    1      1      0       0       1       0       0       0

7            1    0      1      0       0       0       1       0       0

8            1    1      1      0       0       0       1       0       0

9            1    0      0      1       0       0       0       1       0

10           1    1      0      1       0       0       0       1       0

11           1    0      0      1       0       0       0       0       1

12           1    1      0      1       0       0       0       0       1

attr(,"assign")

[1] 0 1 2 2 3 3 3 3 3

attr(,"contrasts")

attr(,"contrasts")$rep [1] "contr.treatment" attr(,"contrasts")$group

[1] "contr.treatment"

attr(,"contrasts")\$strain

[1] "contr.treatment"

I then made my DGEList, normalized, and got an error while trying to estimate Dispersion.

y <- DGEList(counts=data[,1:12], group=group:strain:rep)

y<-calcNormFactors(y)

y <- estimateGLMCommonDisp(y,design)

> y <- estimateGLMCommonDisp(y,design)

Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  :

Design matrix not of full rank.  The following coefficients not estimable:

strain4 strain6

I also tried changing the order of variables in the model (~rep+strain+group) which results in the same error with group 2 and group 3 coefficients not estimable.

modified 3.0 years ago by Gordon Smyth33k • written 3.0 years ago by kjclowers0
5
3.0 years ago by
Aaron Lun19k
Cambridge, United Kingdom
Aaron Lun19k wrote:

If you examine your matrix, you'll notice that you have linear dependencies between your columns. For example, adding the columns strain5 and strain6 will give you the column for group3, while adding strain3 and strain4 will give you the column for group2. It's impossible to estimate these coefficients, because you can get an infinite number of solutions for the best fit.

To resolve this, you need to get rid of the dependencies. This can be done in an automated fashion by using the QR decomposition to pivot out dependent columns. However, I prefer to reparametrize the design matrix itself, in order to get more interpretable coefficients. In your case, the group factor is confounded by the strain factor, so I'd be inclined to just leave it out:

design <- model.matrix(~0+strain+rep)

This gives you a one-way layout for strains, with an additional blocking factor for the replicate. You can then compare between strains quite easily. You can also compare between groups of strains by setting up a contrast between the averages of each group (check out the last example in section 3.2.3 of the user's guide).

Thank you, that makes sense. What if I want to do an ANOVA-like test for any difference between the groups? I'm not sure how I would set up the contrasts for that if strain and rep are the only grouping factors.

Just take all of the pairwise contrasts between groups (using the comparison between averages that I described in my original answer) and cbind them into a contrast matrix. This will test for differences between any of the group averages in an ANOVA-like style.

1
3.0 years ago by
Gordon Smyth33k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth33k wrote:

Just to add to Aaron's post.

The reason why your design matrix is not of full rank is that your 'group' factor is redundant -- it is simply a coarser grouping of the levels of the strain factor that is also in the model. There is no way to estimate effects for redundant variables. So you have to remove it.