Question

Design formula DESeq2

0

Entering edit mode

Ysland • 0

@ysland-20892

Last seen 4.9 years ago

Hi,

Although I have read the extensive information of the DESeq2 vignette and several comments onf forums, I am unable to create a design formula without getting the error the model matrix is not full rank.

My experimental design is a bit complex:

Id  Section Stage   Egg
1   Ant 0hours  Egg_1
2   Ant 0hours  Egg_2
3   Ant 0hours  Egg_3
4   Post    0hours  Egg_1
5   Post    0hours  Egg_2
6   Post    0hours  Egg_3
8   Ant 24hours Egg_4
9   Ant 24hours Egg_5
10  Ant 24hours Egg_6
11  Post    24hours Egg_4
12  Post    24hours Egg_5
13  Post    24hours Egg_6
16  Ant 8hours  Egg_7
17  Ant 8hours  Egg_8
18  Ant 8hours  Egg_9
19  Post    8hours  Egg_7
20  Post    8hours  Egg_8
21  Post    8hours  Egg_9
33  Ant 10hours Egg_10
34  Ant 10hours Egg_11
35  Ant 10hours Egg_12
36  Ant 10hours Egg_13
37  Post    10hours Egg_10
38  Post    10hours Egg_11
39  Post    10hours Egg_12
40  Post    10hours Egg_13

We took 3-4* eggs at 4 different stages (0hours, 8hours, 10hours, 24hours). For each Egg, we sequenced the Anterior (Ant) and Posterior (Post) sections. * Notice I only have 4 eggs for stage 10hours.

I want to find genes that are differentially expressed between Anterior and Posterior at each stage and across stages.

My clustering analysis shows that the main clustering factor is being from the same egg. If I use the limma "removebatcheffect()" I can get read of this effect, but I don't know how to take this into consideration for the DEA.

I tried different design formulas including ~Egg+Stage:Section but I always get the the model matrix is not full rank. Any Idea about how should I design the formula?

Thank you so much!

deseq2 • 805 views

ADD COMMENT • link updated 4.9 years ago by Michael Love 41k • written 4.9 years ago by Ysland • 0

score 1 · Answer 1 · 2019-05-27

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 4 hours ago

United States

As Egg is nested within Stage, and you want to test for Stage specific effects while controlling for Egg, you should probably be following this section of the vignette:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups

ADD COMMENT • link 4.9 years ago Michael Love 41k

0

Entering edit mode

Thank you so much for your reply.

I saw this vignette section, but to be honest, I don't know how to apply it to my design. I don't know how should re-enumerate my samples (in the vignette ind.n) in a way that make sense without creating a redundant column with one of the existent ones...

For instance, if I re-enumerate within each "Section", ind.n is redundant with Egg.

Id  Section Stage   Egg ind.n
1   Ant 0hours  Egg_1   1
2   Ant 0hours  Egg_2   2
3   Ant 0hours  Egg_3   3
8   Ant 24hours Egg_4   4
9   Ant 24hours Egg_5   5
10  Ant 24hours Egg_6   6
16  Ant 8hours  Egg_7   7
17  Ant 8hours  Egg_8   8
18  Ant 8hours  Egg_9   9
33  Ant 10hours Egg_10  10
34  Ant 10hours Egg_11  11
35  Ant 10hours Egg_12  12
36  Ant 10hours Egg_13  13
4   Post    0hours  Egg_1   1
5   Post    0hours  Egg_2   2
6   Post    0hours  Egg_3   3
11  Post    24hours Egg_4   4
12  Post    24hours Egg_5   5
13  Post    24hours Egg_6   6
19  Post    8hours  Egg_7   7
20  Post    8hours  Egg_8   8
21  Post    8hours  Egg_9   9
37  Post    10hours Egg_10  10
38  Post    10hours Egg_11  11
39  Post    10hours Egg_12  12
40  Post    10hours Egg_13  13

If I try to re-enumerate within each section for each stage, ind.n is redundant with Stage....

   Id   Section Stage   Egg ind.n
    1   Ant 0hours  Egg_1   1
    2   Ant 0hours  Egg_2   1
    3   Ant 0hours  Egg_3   1
    8   Ant 24hours Egg_4   2
    9   Ant 24hours Egg_5   2
    10  Ant 24hours Egg_6   2
    16  Ant 8hours  Egg_7   3
    17  Ant 8hours  Egg_8   3
    18  Ant 8hours  Egg_9   3
    33  Ant 10hours Egg_10  4
    34  Ant 10hours Egg_11  4
    35  Ant 10hours Egg_12  4
    36  Ant 10hours Egg_13  4
    4   Post    0hours  Egg_1   1
    5   Post    0hours  Egg_2   1
    6   Post    0hours  Egg_3   1
    11  Post    24hours Egg_4   2
    12  Post    24hours Egg_5   2
    13  Post    24hours Egg_6   2
    19  Post    8hours  Egg_7   3
    20  Post    8hours  Egg_8   3
    21  Post    8hours  Egg_9   3
    37  Post    10hours Egg_10  4
    38  Post    10hours Egg_11  4
    39  Post    10hours Egg_12  4
    40  Post    10hours Egg_13  4

How should I create this ind.n in a way that makes sense? I honestly don't see the way to do it...

Thank you so much!

ADD REPLY • link 4.9 years ago Ysland • 0

0

Entering edit mode

The correspondence is:

vignette : your example

individual : egg

condition : section

group : stage

You have two sections per egg, and the eggs are nested within section. In the vignette, we have two conditions per individual and the individuals are nested within group. In the vignette, we re-number the individuals so that they have the same levels across group (1-3). So you renumber the Eggs such that they number from 1-3, or 1-4 for the 10 hour stage.

ADD REPLY • link 4.9 years ago Michael Love 41k

0

Entering edit mode

Thank you so much Michael for your support and for such a great tool!

I tried what you said and if I understand it good should look like:

 Id Section Stage   Egg ind.n
1   Ant 0hours  Egg_1   1
2   Ant 0hours  Egg_2   2
3   Ant 0hours  Egg_3   3
8   Ant 24hours Egg_4   1
9   Ant 24hours Egg_5   2
10  Ant 24hours Egg_6   3
16  Ant 8hours  Egg_7   1
17  Ant 8hours  Egg_8   2
18  Ant 8hours  Egg_9   3
33  Ant 10hours Egg_10  1
34  Ant 10hours Egg_11  2
35  Ant 10hours Egg_12  3
36  Ant 10hours Egg_13  4
4   Post    0hours  Egg_1   1
5   Post    0hours  Egg_2   2
6   Post    0hours  Egg_3   3
11  Post    24hours Egg_4   1
12  Post    24hours Egg_5   2
13  Post    24hours Egg_6   3
19  Post    8hours  Egg_7   1
20  Post    8hours  Egg_8   2
21  Post    8hours  Egg_9   3
37  Post    10hours Egg_10  1
38  Post    10hours Egg_11  2
39  Post    10hours Egg_12  3
40  Post    10hours Egg_13  4

And the formula Stage + Stage:ind.n + Stage:Section

Now, the resultsNames() shows me the combinations of Section_Stage SectionAnt.StageXX SectionPost.StageXX except the combinations with the stage 0hours, these are missing; SectionAnt.Stage0hours and SectionPost.Stage0hours and I don't get why...

ADD REPLY • link 4.9 years ago Ysland • 0

0

Entering edit mode

Can you put the code that you are using? I'm surprised that you have section in front of stage in your resultsNames. That's not how it comes out in the vignette.

ADD REPLY • link 4.9 years ago Michael Love 41k

0

Entering edit mode

Sorry, the above mentioned formula ( ~Stage + Stage:ind.n + Stage:Section) doesn't work, gives the full rank error.

The formula ~ Section + Section:ind.n + Section:Stage could work but is the one missing one combination; "SectionAnt.Stage0hours"

Using the above design matrix as "coldata"

dds<-DESeqDataSetFromMatrix(Counts_Table, colData=coldata, design=~ Section + Section:ind.n + Section:Stage )
dds <- DESeq(dds)   
resultsNames(dds)

 [1] "Intercept"                     "Section_Post_vs_Ant"          
 [3] "SectionAnt.ind.n2"             "SectionPost.ind.n2"           
 [5] "SectionAnt.ind.n3"             "SectionPost.ind.n3"           
 [7] "SectionAnt.ind.n4"             "SectionPost.ind.n4"           
 [9] "SectionAnt.Stage24hours"        "SectionPost.Stage24hours"      
[11] "SectionAnt.Stage8hours"  "SectionPost.Stage8hours"
[13] "SectionAnt.10hours"   "SectionPost.10hours" 

#This works:
res_1<-results(dds, contrast = list("SectionPost.Stage24hours","SectionAnt.Stage24hours"))

#This doesn't work:
res_2<-results(dds, contrast = list("SectionPost.Stage0hours","SectionAnt.Stage0hours"))

It is really driving me crazy... It apparently seems something easy to do, but I don't manage to make it work...

ADD REPLY • link 4.9 years ago Ysland • 0

0

Entering edit mode

Read over the section I linked to above, the entire section. Dealing with the missing samples is handled as well.

ADD REPLY • link 4.9 years ago Michael Love 41k

0

Entering edit mode

Using the formula ~ Stage + Stage:ind.n + Stage:Section and removing the all zero columns makes DESeq work...

But the resultsNames() does not contain my contrasts of interests...

resultsNames(dds)
 [1] "Intercept"                     "24hours"                  
 [3] "8hours"             "10hours"             
 [5] "0hours.ind.n2"           "24hours.ind.n2"           
 [7] "Stage8hours.ind.n2"      "Stage10hours.ind.n2"      
 [9] "Stage0hours.ind.n3"           "Stage24hours.ind.n3"           
[11] "Stage8hours.ind.n3"      "Stage10hours.ind.n3"      
[13] "Stage10hours.ind.n4"       "Stage0hours.SectionPost"     
[15] "Stage24hours.SectionPost"       "Stage8hours.SectionPost"
[17] "Stage10hours.SectionPost"

Contains all the StageX.SectionPost but not the StageX.SectionAnt.

In the viggnette instead of comparing "grpY.cndB" vs "grpX.cndB" how would you compare "grpY.cndA" cs"grpX.cndA", it is not in the model matrix...

ADD REPLY • link 4.9 years ago Ysland • 0

0

Entering edit mode

This is discussed in the vignette. See "Note on factor levels".

ADD REPLY • link 4.9 years ago Michael Love 41k

0

Entering edit mode

Thank you!

But this section explains how the reference group would be chosen by alphabetical order, and that this can be changed by using contrast or relevel the factors.

I can do results(dds, name = ("Stage0hours.SectionPost"))

But the thing is that I don't want to use compare Stage0hours.SectionPost with baseline "Section.Ant" , but to compare with the "Stage0hours.Section.Ant".

But I can't indicate this the using contrast because StageX.SectionAntinsis is not in resultsNames() all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'

ADD REPLY • link 4.9 years ago Ysland • 0

0

Entering edit mode

Oh I didn't understand your question.

StageX.SectionPost gives the Post vs Ant comparison for Stage X, controlling for individual-level differences. This is similar to the vignette section where we say:

Above, the terms grpX.cndB and grpY.cndB give the group-specific condition effects, in other words, the condition B vs A effect for group X samples, and likewise for group Y samples.

ADD REPLY • link 4.9 years ago Michael Love 41k