3 factor design formula 'Model Matrix not full rank'
1
0
Entering edit mode
rrcutler ▴ 70
@rrcutler-10667
Last seen 4.1 years ago

Hello fellow DESeq2 users,

 I am having trouble figuring out how to set up the design formula for an experiment with 3 factors with up to 3 levels within each. I have narrowed down the problem to the positive controls I want to include but I am unsure why they are giving me the 'Model Matrix not full rank' error. They do not appear to be a linear combination or a combination of other factor levels to me. 

 I am able to successfully set up the experiment without the positive controls and a design with interaction terms like so:

   stage treatment surgery

1     18         C      11

2     18         C      11

3     18         C      11

4     18         C      11

5     18         C      11

6     18         R      11

7     18         R      11

8     18         R      11

9     18         R      11

10    18         R      11

11    18         C      12

12    18         C      12

13    18         C      12

14    18         C      12

15    18         C      12

16    18         R      12

17    18         R      12

18    18         R      12

19    18         R      12

20    18         R      12

21    30         C      11

22    30         C      11

23    30         C      11

24    30         C      11

25    30         C      11

26    30         R      11

27    30         R      11

28    30         R      11

29    30         R      11

30    30         R      11

31    30         C      12

32    30         C      12

33    30         C      12

34    30         C      12

35    30         C      12

36    30         R      12

37    30         R      12

38    30         R      12

39    30         R      12

40    30         R      12

dds <- DESeqDataSetFromHTSeqCount(sampleTable = sTable, directory = "", design = ~ stage + surgery + treatment + stage:surgery + stage:treatment + stage:surgery:treatment + surgery:treatment)

 

When I try and include the positive controls, 5 in stage 18 and 30 respectively with no treatment or surgery, I am getting the errors:

 

     stage treatment surgery

1     18         C      11

2     18         C      11

3     18         C      11

4     18         C      11

5     18         C      11

6     18         R      11

7     18         R      11

8     18         R      11

9     18         R      11

10    18         R      11

11    18         C      12

12    18         C      12

13    18         C      12

14    18         C      12

15    18         C      12

16    18         R      12

17    18         R      12

18    18         R      12

19    18         R      12

20    18         R      12

21    18   NoTreat  NoSurg

22    18   NoTreat  NoSurg

23    18   NoTreat  NoSurg

24    18   NoTreat  NoSurg

25    18   NoTreat  NoSurg

26    30         C      11

27    30         C      11

28    30         C      11

29    30         C      11

30    30         C      11

31    30         R      11

32    30         R      11

33    30         R      11

34    30         R      11

35    30         R      11

36    30         C      12

37    30         C      12

38    30         C      12

39    30         C      12

40    30         C      12

41    30         R      12

42    30         R      12

43    30         R      12

44    30         R      12

45    30         R      12

46    30   NoTreat  NoSurg

47    30   NoTreat  NoSurg

48    30   NoTreat  NoSurg

49    30   NoTreat  NoSurg

50    30   NoTreat  NoSurg

dds <- DESeqDataSetFromHTSeqCount(sampleTable = sTable, directory = "", design = ~ stage + surgery + treatment + stage:surgery + stage:treatment + stage:surgery:treatment + surgery:treatment)

Error in checkFullRank(modelMatrix) :

  the model matrix is not full rank, so the model cannot be fit as specified.

  One or more variables or interaction terms in the design formula are linear

  combinations of the others and must be removed.
deseq2 multiple factor design • 2.1k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

The problem is that you want to have treatment and surgery, but the "no treatment" and "no surgery" samples are the same. To make it more clear, suppose you have control (C), and treatments W,X for variable 1 and Y,Z for variable 2:

1 2
C C
C C
W Y
W Z
X Y
X Z

While the treatments are blocked, and you could run the model without the C's. When you want to add the C's, you never get to see what the effect of W is when variable 2 is control, or vice versa for variable 1. So even just with main effects, it will not give not a full rank design matrix, because W + X = Y + Z (here where the letter stands for the effect relative to C as reference level). There are many (infinite) set of coefficients which are equally likely for those 4 coefficients.

ADD COMMENT
0
Entering edit mode

Hi Michael,

Thanks for shedding light on this. So in your simplified example, a solution to this would be to change the treatments of the control in one variable so it looks something like this:

1 2
C Y
C Z
W Y
W Z
X Y
X Z

Although this is obviously infeasible. As you pointed out I am able to run the model when removing the positive controls. Is there another way that I am missing where I can include these positive controls into the design? A separate factor? I want to make comparisons with these against the controls and also get the interactions. My knowledge in linear modeling and setting up designs is limited. 

ADD REPLY
1
Entering edit mode

You can combine the two variables into one factor, with levels: control, WY, WZ, XY and XZ. Then you can have interactions of this with another variable. But you can't have them both in the design because you don't have samples where only one is the control. At some point in the future you can discuss the experimental design with a local statistician, who can explain this part.

ADD REPLY
0
Entering edit mode

Okay that works! I will update on how the experimental design turns out.

ADD REPLY
0
Entering edit mode

Seeing this recent post: 

C: DESeq2- what to do when two conditions share controls?

It seems like we could use the solution that you proposed there, by coding the controls in the surgery column as "11" instead of "NoSurg" and then leaving the "NoTreat", level in the treatment column to distinguish between the two. However, after trying this with the design:

~stage + stage:treatment + stage:treatment:surgery

I still am getting the same "model matrix is not full rank" error. I'm sure this is for the same reasons that you identified in the original issue. Although this seems to me like a similar problem to the post linked to as the "NoTreat NoSurg" Samples can be seen as shared controls to both the "C", "R" and "11", "12", treatments at the respective stages. 

Perhaps I am still not understanding:

"But you can't have them both in the design because you don't have samples where only one is the control."

Your thoughts on this would be appreciated. (Still looking for a statistician)

 

Thanks again,

-R

ADD REPLY
1
Entering edit mode

Unfortunately, I don't have sufficient time right now to go into why an alternate design produces an error. If you want to explore on your own, you can take a look at the matrices yourself with model.matrix(design(dds), colData(dds)) and notice which columns are linear combinations of which other columns (which makes the set of equations not possible to solve for a unique set of coefficients).

Re: that previous sentence, go back to this example I gave before above: "you never get to see what the effect of W is when variable 2 is control, or vice versa for variable 1". You need those samples to tease apart the independent effects of either, but you didn't generate them. 

The best approach (the one I gave above) is to combine those two variables, because this generates a factor when the levels comprise all of the actual conditions you generated samples for. Then no problem with the rank of the model matrix.

ADD REPLY
0
Entering edit mode

Thanks so much for clearing this up!

ADD REPLY

Login before adding your answer.

Traffic: 454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6