Dear DeSeq2 community:
I have a question regarding siblings. The study I am currently analyzing has 2-4 siblings from a total of 30 unrelated families (121 sequenced individuals total). Half of the families were exposed to a treatment/condition and the other half were not (see example table below). For complex reasons, there was no way this experiment could be performed such that siblings within each family could have been split between treatment and control (i.e., treatment actually occurred on parents and all siblings were reared in a common environment). Obviously, if family is used as fixed effect, DeSeq2 appropriately responds with the “model not full rank…” message.
Individual | Condition | Family |
---|---|---|
1 | treated | A |
2 | treated | A |
3 | treated | A |
4 | treated | A |
5 | not treated | B |
6 | not treated | B |
7 | not treated | B |
8 | not treated | B |
9 | treated | C |
11 | treated | C |
12 | not treated | D |
13 | not treated | D |
... | treated | A |
121 | treated | Z |
I see four approaches for moving forward: 1. Analyze full data set and ignore possible effects of siblings (they do cluster in PCA space, but not excessively), 2. Collapse siblings as though they were technical replicates, 3. Use individuals nested within groups, or 4. Bootstrap by randomly dropping all but one sibling per family, calculating DE genes, and repeating hundreds of times.
Based on my readings of the vignette, approach 3 is not appropriate here because the siblings were not split across treatment and control. Approach 4 seems excessively conservative and would result in a large reduction in power. Thus, I am leaning towards analyzing the data with both approaches 1 and 2 and carrying all (or most) downstream analyses forward and making comparisons. Does this plan sound reasonable or would you recommend a different approach? I am only interested in the treatment effect (compared to controls).
Many thanks!