EdgeR Design matrix not of full rank
2
0
Entering edit mode
kjclowers • 0
@kjclowers-7945
Last seen 7.1 years ago
United States

I am trying to set up model where I have paired biological replicates as a blocking factor, and I am interested in the differential expression in individual strains as well as between specific groups of strains that were isolated from similar environments. I am getting the Matrix not of full rank error even though none of the columns in my design matrix are all 0s. Any advice on how to resolve this problem?

After reading in my data, I made the following variables:

rep <- factor(c(1,2,1,2,1,2,1,2,1,2,1,2))

group <- factor(c(1,1,1,1,2,2,2,2,3,3,3,3))

strain <- factor(c(1,1,2,2,3,3,4,4,5,5,6,6))

 

I then created by design:

design <- model.matrix(~rep+group+strain, data=data)

which looks like this:

> design

   (Intercept) rep2 group2 group3 strain2 strain3 strain4 strain5 strain6

1            1    0      0      0       0       0       0       0       0

2            1    1      0      0       0       0       0       0       0

3            1    0      0      0       1       0       0       0       0

4            1    1      0      0       1       0       0       0       0

5            1    0      1      0       0       1       0       0       0

6            1    1      1      0       0       1       0       0       0

7            1    0      1      0       0       0       1       0       0

8            1    1      1      0       0       0       1       0       0

9            1    0      0      1       0       0       0       1       0

10           1    1      0      1       0       0       0       1       0

11           1    0      0      1       0       0       0       0       1

12           1    1      0      1       0       0       0       0       1

attr(,"assign")

[1] 0 1 2 2 3 3 3 3 3

attr(,"contrasts")

attr(,"contrasts")$rep

[1] "contr.treatment"

attr(,"contrasts")$group

[1] "contr.treatment"

attr(,"contrasts")$strain

[1] "contr.treatment"

 

I then made my DGEList, normalized, and got an error while trying to estimate Dispersion.

y <- DGEList(counts=data[,1:12], group=group:strain:rep)

y<-calcNormFactors(y)

y <- estimateGLMCommonDisp(y,design)

> y <- estimateGLMCommonDisp(y,design)

Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  : 

  Design matrix not of full rank.  The following coefficients not estimable:

 strain4 strain6

 

I also tried changing the order of variables in the model (~rep+strain+group) which results in the same error with group 2 and group 3 coefficients not estimable.

edger • 14k views
ADD COMMENT
5
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 20 hours ago
The city by the bay

If you examine your matrix, you'll notice that you have linear dependencies between your columns. For example, adding the columns strain5 and strain6 will give you the column for group3, while adding strain3 and strain4 will give you the column for group2. It's impossible to estimate these coefficients, because you can get an infinite number of solutions for the best fit.

To resolve this, you need to get rid of the dependencies. This can be done in an automated fashion by using the QR decomposition to pivot out dependent columns. However, I prefer to reparametrize the design matrix itself, in order to get more interpretable coefficients. In your case, the group factor is confounded by the strain factor, so I'd be inclined to just leave it out:

design <- model.matrix(~0+strain+rep)

This gives you a one-way layout for strains, with an additional blocking factor for the replicate. You can then compare between strains quite easily. You can also compare between groups of strains by setting up a contrast between the averages of each group (check out the last example in section 3.2.3 of the user's guide).

ADD COMMENT
0
Entering edit mode

Thank you, that makes sense. What if I want to do an ANOVA-like test for any difference between the groups? I'm not sure how I would set up the contrasts for that if strain and rep are the only grouping factors.

ADD REPLY
0
Entering edit mode

Just take all of the pairwise contrasts between groups (using the comparison between averages that I described in my original answer) and cbind them into a contrast matrix. This will test for differences between any of the group averages in an ANOVA-like style.

ADD REPLY
1
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

Just to add to Aaron's post.

The reason why your design matrix is not of full rank is that your 'group' factor is redundant -- it is simply a coarser grouping of the levels of the strain factor that is also in the model. There is no way to estimate effects for redundant variables. So you have to remove it.

ADD COMMENT

Login before adding your answer.

Traffic: 800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6