I'm trying to do a parental allele-specific expression analysis using DESeq2 based on this approach from Michael Love and this post elsewhere on this forum. I have the following samples:
cross allele time mouse
1 BxC mat 0 1
2 BxC mat 0 2
3 BxC mat 0 3
4 CxB mat 0 4
5 CxB mat 0 5
6 CxB mat 0 6
7 BxC mat 1 7
8 BxC mat 1 8
9 BxC mat 1 9
10 BxC mat 2 10
11 BxC mat 2 11
12 BxC mat 2 12
13 BxC mat 5 13
14 BxC mat 5 14
15 BxC mat 5 15
16 BxC mat 7 16
17 BxC mat 7 17
18 BxC mat 7 18
19 BxC pat 0 1
20 BxC pat 0 2
21 BxC pat 0 3
22 CxB pat 0 4
23 CxB pat 0 5
24 CxB pat 0 6
25 BxC pat 1 7
26 BxC pat 1 8
27 BxC pat 1 9
28 BxC pat 2 10
29 BxC pat 2 11
30 BxC pat 2 12
31 BxC pat 5 13
32 BxC pat 5 14
33 BxC pat 5 15
34 BxC pat 7 16
35 BxC pat 7 17
36 BxC pat 7 18
Explanation of variables
cross: parental mouse strains for the sample;BxCmeans maternal C57BL/6J and paternal CAST/EiJ,CxBmeans the opposite.allele:maternal orpaternal allelic counts for the associated sampletime: mouse age normalized to my earliest time point, E8.5mouse: the individual mouse sequenced
My ideal design is ~0+cross+time+mouse+allele+allele:time, but unsurprisingly this doesn't produce a full rank model matrix as each individual mouse is part of an isolated group of 3. The DESeq2 vignette suggests that the solution is to create a nested "individual-in-group" variable, which would look like this:
cross allele time mouse mouseInGroup
1 BxC mat 0 1 1
2 BxC mat 0 2 2
3 BxC mat 0 3 3
4 CxB mat 0 4 1
5 CxB mat 0 5 2
6 CxB mat 0 6 3
7 BxC mat 1 7 1
8 BxC mat 1 8 2
9 BxC mat 1 9 3
10 BxC mat 2 10 1
11 BxC mat 2 11 2
12 BxC mat 2 12 3
13 BxC mat 5 13 1
14 BxC mat 5 14 2
15 BxC mat 5 15 3
16 BxC mat 7 16 1
17 BxC mat 7 17 2
18 BxC mat 7 18 3
19 BxC pat 0 1 1
20 BxC pat 0 2 2
21 BxC pat 0 3 3
22 CxB pat 0 4 1
23 CxB pat 0 5 2
24 CxB pat 0 6 3
25 BxC pat 1 7 1
26 BxC pat 1 8 2
27 BxC pat 1 9 3
28 BxC pat 2 10 1
29 BxC pat 2 11 2
30 BxC pat 2 12 3
31 BxC pat 5 13 1
32 BxC pat 5 14 2
33 BxC pat 5 15 3
34 BxC pat 7 16 1
35 BxC pat 7 17 2
36 BxC pat 7 18 3
Since each group is defined by both cross and time, my understanding is my design would need to be ~0+cross+time+cross:time:mouseInGroup+allele+allele:time. This still does not produce a full rank model matrix, though. The DESeq2 vignette suggests this may also be a result of some columns being all 0s:
all.zero = colSums(design) == 0
all.zero
crossBxC crossCxB time allelepat
FALSE FALSE FALSE FALSE
time:allelepat crossBxC:time:mouseInGroup1 crossCxB:time:mouseInGroup1 crossBxC:time:mouseInGroup2
FALSE FALSE TRUE FALSE
crossCxB:time:mouseInGroup2 crossBxC:time:mouseInGroup3 crossCxB:time:mouseInGroup3
TRUE FALSE TRUE
Removing those columns, however...
idx = which(all.zero)
testDesign = design[,-idx]
colSums(testDesign) == 0
crossBxC crossCxB time allelepat
FALSE FALSE FALSE FALSE
time:allelepat crossBxC:time:mouseInGroup1 crossBxC:time:mouseInGroup2 crossBxC:time:mouseInGroup3
FALSE FALSE FALSE FALSE
... still doesn't produce a model matrix that satisfies DESeq2. Any idea what the problem is and how to solve it? My suspicion is the problem is related to the fact I only have CxB samples at time 0, but they're also the only way for me to determine if a gene's ASE is due to the parent-of-origin's strain instead of actual parental allele regulation. Any help is appreciated!
Edit: for the sake of ease, some code to build the sample table:
cross = rep(c("BxC", "CxB", rep("BxC", 4)), each = 3, times = 2)
allele = rep(c("mat", "pat"), each = 18)
time = rep(c(0, 0, 1, 2, 5, 7), each = 3, times = 2)
mouse = factor(rep(1:18, times = 2))
sampleInfo = data.frame(cross, allele, time, mouse)

Ah, this is handy! Hopefully this will help with troubleshooting; at the very least, I can see it being helpful when I try to explain to people what the different coefficients mean.
If I'm understanding you correctly, you're suggesting creating an additional variable like so:
Assuming so, here are the results of a few test design formulae:
~0+cross+time+crossTime:mouseInGroup+allele+allele:time: not full rank~0+crossTime+crossTime:mouseInGroup+allele+allele:time: not full rank~0+crossTime+crossTime:mouseInGroup+allele+allele:crossTime: full rank, but I'm not sure how I'd extricate "here is how ASE changes over time" fromallele:crossTimewithout like, isolating theBxC-containing coefficients and doing a linear regression on those.~0+cross+time+crossTime:mouseInGroup+allele+allele:crossTime: not full rank, not really sure what I was hoping for with this one.It seems like the issue is that, if I use
crossandtimeseparately in addition tocrossTime, they end up describing the same groups. So, still not sure how to build a working design formula for this experiment. Maybe it'll help if I try to start from an ideal model and derive a design formula from there? It may also be time to change tools; I wanted to use DESeq2 because of its normalization strategy, but it might work better to just calculate FPKM -> log2(allelic ratio) and then use something to model & evaluate that...Hm, okay, in building a fake model by hand I think I'm realizing the problem:
mouseand its variants are ideally acting as basically a random effect variable to try and establish an imaginary "baseline expression" for each mouse. I think I have to effectively average that out and pick a variable to incorporate that average into, then figure out relevant coefficients based on that? Maybe? Confirmation would be appreciated; the joys of independently learning statistical modeling is I never feel sure I know what I'm doing. x_xYou just need the mouse identifier (controls baseline) and the group specific allelic effect.
You may want to speak with a local statistician, because you don't just want to find the first full rank matrix and run with that if it doesn't represent what you are after.
Alternatively you can examine the model matrix yourself and spot what is happening.