Design matrix not of full rank
2
0
Entering edit mode
prat • 0
@prat-10026
Last seen 5.4 years ago

My design matrix is below for an RNASeq experiment and somewhat similar to Section 3.4.2 Blocking in the manual I'm trying to follow,

sample    condition    mouse
ctrl_minus_1    ctrl_minus    1
ctrl_plus_1    ctrl_plus    1
ko_minus_1    ko_minus    2
ko_plus_1    ko_plus    2
ctrl_minus_2    ctrl_minus    3
ctrl_plus_2    ctrl_plus    3
ko_minus_2    ko_minus    4
ko_plus_2    ko_plus    4
ctrl_minus_3    ctrl_minus    5
ctrl_plus_3    ctrl_plus    5
ko_minus_3    ko_minus    6
ko_plus_3    ko_plus    6

Where the design is as follows, two conditions (ctrl/ko) and for each treated/untreated with a drug (plus/minus).  Un/treated are paired from the same mouse and I want to take that into consideration the mouse-level variability...

My model is,
design   <- model.matrix(~mouse + condition)

and the comparisons of interest are,

my.contrasts <- makeContrasts(ctrl_plus_minus=conditionctrl_plus, ko_plus_minus=conditionko_plus - conditionko_minus, ctrl_plus=conditionko_plus - conditionctrl_plus,
ctrl_minus=conditionko_minus, ctrl_vs_ko= conditionctrl_plus  - (conditionko_plus - conditionko_minus), levels=design)

which gives the following error,

> my.DGEList  <- estimateGLMCommonDisp(my.DGEList, design, verbose=TRUE)
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  :
  Design matrix not of full rank.  The following coefficients not estimable:
 conditionko_plus
Calls: estimateGLMCommonDisp ... f -> adjustedProfileLik -> glmFit -> glmFit.default
Execution halted

I tried including the intercept in the model,

design   <- model.matrix(~0 + condition + mouse)
my.contrasts <- makeContrasts(ctrl_plus_minus=conditionctrl_plus - conditionctrl_minus, ko_plus_minus=conditionko_plus - conditionko_minus, ctrl_plus=conditionko_plus - conditionctrl_plus,
ctrl_minus=conditionko_minus - conditionctrl_minus, ctrl_vs_syk= (conditionctrl_plus - conditionctrl_minus) - (conditionko_plus - conditionko_minus), levels=design)

but get a similar error,

 my.DGEList  <- estimateGLMCommonDisp(my.DGEList, design, verbose=TRUE)
Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  :
  Design matrix not of full rank.  The following coefficients not estimable:
 mouse6
Calls: estimateGLMCommonDisp ... f -> adjustedProfileLik -> glmFit -> glmFit.default
Execution halted

Why is the design matrix not of full rank?

Thanks,
Prat

edger • 5.2k views
ADD COMMENT
3
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 1 hour ago
The city by the bay

I'll reframe your code a bit to make it easier to answer. I'll consider the following:

condition <- rep(c("ctrl_minus", "ctrl_plus", "ko_minus", "ko_plus"), 3)
condition <- factor(condition)
batch <- factor(rep(1:6, each=2))
design <- model.matrix(~0 + batch + condition)

If you look closely at design, you'll notice that the vector of row sums of columns 8 and 9 is equal to that of columns 2, 4 and 6. This means that you can get an infinite number of solutions; for example, if you increase the value of coefficients 8 and 9 by some value x, you can decrease the coefficients 2, 4 and 6 by the same value, and end up with the same fitted values. This is what edgeR is complaining about when it says that the matrix is not of full rank.

Now, it's a bit hard to diagnose these problems from the design alone. Another way is to have a look at colnames(design).

[1] "batch1"             "batch2"             "batch3"            
[4] "batch4"             "batch5"             "batch6"            
[7] "conditionctrl_plus" "conditionko_minus"  "conditionko_plus"  

The first 6 coefficients represent the average expression of the "minus" sample in each mouse. The last three coefficients represent the effect of drug treatment in each genotype. However, the conditionko_minus coefficient doesn't make sense here, because it would represent the effect of no treatment, and we don't need an extra coefficient for that. So, we chuck it out:

design <- design[,-8]

If you do this and run it through edgeR, you'll find that the error message will go away. This is because edgeR can now identify a single solution for the GLM fit. It's also easier to interpret the coefficients when you don't have nonsensical terms in your design matrix.

ADD COMMENT
0
Entering edit mode
b.nota ▴ 370
@bnota-7379
Last seen 4.2 years ago
Netherlands

Check section 3.5 Comparisons Both Between and Within Subjects of the edgeR manual.

ADD COMMENT

Login before adding your answer.

Traffic: 458 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6