Hi,
Although I have read the extensive information of the DESeq2 vignette and several comments onf forums, I am unable to create a design formula without getting the error the model matrix is not full rank.
My experimental design is a bit complex:
Id Section Stage Egg
1 Ant 0hours Egg_1
2 Ant 0hours Egg_2
3 Ant 0hours Egg_3
4 Post 0hours Egg_1
5 Post 0hours Egg_2
6 Post 0hours Egg_3
8 Ant 24hours Egg_4
9 Ant 24hours Egg_5
10 Ant 24hours Egg_6
11 Post 24hours Egg_4
12 Post 24hours Egg_5
13 Post 24hours Egg_6
16 Ant 8hours Egg_7
17 Ant 8hours Egg_8
18 Ant 8hours Egg_9
19 Post 8hours Egg_7
20 Post 8hours Egg_8
21 Post 8hours Egg_9
33 Ant 10hours Egg_10
34 Ant 10hours Egg_11
35 Ant 10hours Egg_12
36 Ant 10hours Egg_13
37 Post 10hours Egg_10
38 Post 10hours Egg_11
39 Post 10hours Egg_12
40 Post 10hours Egg_13
We took 3-4* eggs at 4 different stages (0hours, 8hours, 10hours, 24hours). For each Egg, we sequenced the Anterior (Ant) and Posterior (Post) sections. * Notice I only have 4 eggs for stage 10hours.
I want to find genes that are differentially expressed between Anterior and Posterior at each stage and across stages.
My clustering analysis shows that the main clustering factor is being from the same egg. If I use the limma "removebatcheffect()" I can get read of this effect, but I don't know how to take this into consideration for the DEA.
I tried different design formulas including ~Egg+Stage:Section
but I always get the the model matrix is not full rank. Any Idea about how should I design the formula?
Thank you so much!
Thank you so much for your reply.
I saw this vignette section, but to be honest, I don't know how to apply it to my design. I don't know how should re-enumerate my samples (in the vignette ind.n) in a way that make sense without creating a redundant column with one of the existent ones...
For instance, if I re-enumerate within each "Section", ind.n is redundant with Egg.
If I try to re-enumerate within each section for each stage, ind.n is redundant with Stage....
How should I create this ind.n in a way that makes sense? I honestly don't see the way to do it...
Thank you so much!
The correspondence is:
vignette : your example
individual : egg
condition : section
group : stage
You have two sections per egg, and the eggs are nested within section. In the vignette, we have two conditions per individual and the individuals are nested within group. In the vignette, we re-number the individuals so that they have the same levels across group (1-3). So you renumber the Eggs such that they number from 1-3, or 1-4 for the 10 hour stage.
Thank you so much Michael for your support and for such a great tool!
I tried what you said and if I understand it good should look like:
And the formula
Stage + Stage:ind.n + Stage:Section
Now, the resultsNames() shows me the combinations of Section_Stage
SectionAnt.StageXX SectionPost.StageXX
except the combinations with the stage 0hours, these are missing;SectionAnt.Stage0hours
andSectionPost.Stage0hours
and I don't get why...Can you put the code that you are using? I'm surprised that you have section in front of stage in your resultsNames. That's not how it comes out in the vignette.
Sorry, the above mentioned formula (
~Stage + Stage:ind.n + Stage:Section
) doesn't work, gives the full rank error.The formula
~ Section + Section:ind.n + Section:Stage
could work but is the one missing one combination; "SectionAnt.Stage0hours"Using the above design matrix as "coldata"
It is really driving me crazy... It apparently seems something easy to do, but I don't manage to make it work...
Read over the section I linked to above, the entire section. Dealing with the missing samples is handled as well.
Using the formula
~ Stage + Stage:ind.n + Stage:Section
and removing the all zero columns makes DESeq work...But the resultsNames() does not contain my contrasts of interests...
Contains all the StageX.SectionPost but not the StageX.SectionAnt.
In the viggnette instead of comparing "grpY.cndB" vs "grpX.cndB" how would you compare "grpY.cndA" cs"grpX.cndA", it is not in the model matrix...
This is discussed in the vignette. See "Note on factor levels".
Thank you!
But this section explains how the reference group would be chosen by alphabetical order, and that this can be changed by using
contrast
or relevel the factors.I can do
results(dds, name = ("Stage0hours.SectionPost"))
But the thing is that I don't want to use compare Stage0hours.SectionPost with baseline "Section.Ant" , but to compare with the "Stage0hours.Section.Ant".
But I can't indicate this the using contrast because StageX.SectionAntinsis is not in resultsNames()
all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object)'
Oh I didn't understand your question.
StageX.SectionPost
gives the Post vs Ant comparison for Stage X, controlling for individual-level differences. This is similar to the vignette section where we say: