Search
Question: EdgeR Design matrix not of full rank
0
gravatar for kjclowers
3.4 years ago by
kjclowers0
United States
kjclowers0 wrote:

I am trying to set up model where I have paired biological replicates as a blocking factor, and I am interested in the differential expression in individual strains as well as between specific groups of strains that were isolated from similar environments. I am getting the Matrix not of full rank error even though none of the columns in my design matrix are all 0s. Any advice on how to resolve this problem?

After reading in my data, I made the following variables:

rep <- factor(c(1,2,1,2,1,2,1,2,1,2,1,2))

group <- factor(c(1,1,1,1,2,2,2,2,3,3,3,3))

strain <- factor(c(1,1,2,2,3,3,4,4,5,5,6,6))

 

I then created by design:

design <- model.matrix(~rep+group+strain, data=data)

which looks like this:

> design

   (Intercept) rep2 group2 group3 strain2 strain3 strain4 strain5 strain6

1            1    0      0      0       0       0       0       0       0

2            1    1      0      0       0       0       0       0       0

3            1    0      0      0       1       0       0       0       0

4            1    1      0      0       1       0       0       0       0

5            1    0      1      0       0       1       0       0       0

6            1    1      1      0       0       1       0       0       0

7            1    0      1      0       0       0       1       0       0

8            1    1      1      0       0       0       1       0       0

9            1    0      0      1       0       0       0       1       0

10           1    1      0      1       0       0       0       1       0

11           1    0      0      1       0       0       0       0       1

12           1    1      0      1       0       0       0       0       1

attr(,"assign")

[1] 0 1 2 2 3 3 3 3 3

attr(,"contrasts")

attr(,"contrasts")$rep

[1] "contr.treatment"

attr(,"contrasts")$group

[1] "contr.treatment"

attr(,"contrasts")$strain

[1] "contr.treatment"

 

I then made my DGEList, normalized, and got an error while trying to estimate Dispersion.

y <- DGEList(counts=data[,1:12], group=group:strain:rep)

y<-calcNormFactors(y)

y <- estimateGLMCommonDisp(y,design)

> y <- estimateGLMCommonDisp(y,design)

Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  : 

  Design matrix not of full rank.  The following coefficients not estimable:

 strain4 strain6

 

I also tried changing the order of variables in the model (~rep+strain+group) which results in the same error with group 2 and group 3 coefficients not estimable.

ADD COMMENTlink modified 3.4 years ago by Gordon Smyth35k • written 3.4 years ago by kjclowers0
5
gravatar for Aaron Lun
3.4 years ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

If you examine your matrix, you'll notice that you have linear dependencies between your columns. For example, adding the columns strain5 and strain6 will give you the column for group3, while adding strain3 and strain4 will give you the column for group2. It's impossible to estimate these coefficients, because you can get an infinite number of solutions for the best fit.

To resolve this, you need to get rid of the dependencies. This can be done in an automated fashion by using the QR decomposition to pivot out dependent columns. However, I prefer to reparametrize the design matrix itself, in order to get more interpretable coefficients. In your case, the group factor is confounded by the strain factor, so I'd be inclined to just leave it out:

design <- model.matrix(~0+strain+rep)

This gives you a one-way layout for strains, with an additional blocking factor for the replicate. You can then compare between strains quite easily. You can also compare between groups of strains by setting up a contrast between the averages of each group (check out the last example in section 3.2.3 of the user's guide).

ADD COMMENTlink written 3.4 years ago by Aaron Lun21k

Thank you, that makes sense. What if I want to do an ANOVA-like test for any difference between the groups? I'm not sure how I would set up the contrasts for that if strain and rep are the only grouping factors.

ADD REPLYlink written 3.4 years ago by kjclowers0

Just take all of the pairwise contrasts between groups (using the comparison between averages that I described in my original answer) and cbind them into a contrast matrix. This will test for differences between any of the group averages in an ANOVA-like style.

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by Aaron Lun21k
1
gravatar for Gordon Smyth
3.4 years ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:

Just to add to Aaron's post.

The reason why your design matrix is not of full rank is that your 'group' factor is redundant -- it is simply a coarser grouping of the levels of the strain factor that is also in the model. There is no way to estimate effects for redundant variables. So you have to remove it.

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Gordon Smyth35k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 341 users visited in the last hour