Hi all,

I have three RNA-Seq datasets to perform DESeq2 differential expression analysis. My main interest is to

- Compare the treatment to control and see if there are differences between sex
- Compare the treatment to control, and control for sex and batch effect
- Compare the treatment to control and also the difference between genotypes, controlling for sex and batch effect

The corresponding datasets from the simplest to complex are:

First,

```
subject sex condition
1 M treatment
1 M control
2 M treatment
2 M control
3 M treatment
3 M control
4 F treatment
4 F control
```

Second (subject 1-4 are the same as in the first dataset),

```
subject sex condition batch
1 M treatment C
1 M control C
2 M treatment C
2 M control C
3 M treatment C
3 M control C
4 F treatment C
4 F control C
5 M treatment B
5 M control B
6 M treatment B
6 M control B
7 F treatment A
7 F control A
8 F treatment A
8 F control A
```

Third (subject 1-8 are the same as in the second dataset),

```
subject sex condition batch genotype
1 M treatment C X
1 M control C X
2 M treatment C X
2 M control C X
3 M treatment C X
3 M control C X
4 F treatment C X
4 F control C X
5 M treatment B X
5 M control B X
6 M treatment B X
6 M control B X
7 F treatment A X
7 F control A X
8 F treatment A X
8 F control A X
9 M treatment B Y
9 M control B Y
10 M treatment B Y
10 M control B Y
11 M treatment B Y
11 M control B Y
```

At the beginning, I did not use paired-sample design and all the following can work:

- design = ~ sex + condition
- design = ~ sex + condition + batch
- design = ~ sex + condition + batch + genotype + genotype:condition

However, when considering the paired sample (each subject contain two conditions), the following all return an error which the model matrix is not full rank:

- design = ~ subject + sex + condition
- design = ~ subject + sex + condition + batch
- design = ~ subject + sex + condition + batch + genotype + genotype:condition

Can someone guide me how to make correct designs? Any suggestions or recommended statistical reading would be appreciated. Thanks!

Yes, Michael, you are right. I discussed these questions with my supervisor and we think I should simplify my statistical design.