DESeq2 design formula with multiple factors
1
0
Entering edit mode
Helena • 0
@helena-23852
Last seen 14 months ago

Hi all,

I have three RNA-Seq datasets to perform DESeq2 differential expression analysis. My main interest is to

1. Compare the treatment to control and see if there are differences between sex
2. Compare the treatment to control, and control for sex and batch effect
3. Compare the treatment to control and also the difference between genotypes, controlling for sex and batch effect

The corresponding datasets from the simplest to complex are:

First,

subject    sex    condition
1      M    treatment
1      M      control
2      M    treatment
2      M      control
3      M    treatment
3      M      control
4      F    treatment
4      F      control


Second (subject 1-4 are the same as in the first dataset),

subject    sex    condition    batch
1      M    treatment        C
1      M      control        C
2      M    treatment        C
2      M      control        C
3      M    treatment        C
3      M      control        C
4      F    treatment        C
4      F      control        C
5      M    treatment        B
5      M      control        B
6      M    treatment        B
6      M      control        B
7      F    treatment        A
7      F      control        A
8      F    treatment        A
8      F      control        A


Third (subject 1-8 are the same as in the second dataset),

subject    sex    condition    batch    genotype
1      M    treatment        C           X
1      M      control        C           X
2      M    treatment        C           X
2      M      control        C           X
3      M    treatment        C           X
3      M      control        C           X
4      F    treatment        C           X
4      F      control        C           X
5      M    treatment        B           X
5      M      control        B           X
6      M    treatment        B           X
6      M      control        B           X
7      F    treatment        A           X
7      F      control        A           X
8      F    treatment        A           X
8      F      control        A           X
9      M    treatment        B           Y
9      M      control        B           Y
10      M    treatment        B           Y
10      M      control        B           Y
11      M    treatment        B           Y
11      M      control        B           Y


At the beginning, I did not use paired-sample design and all the following can work:

1. design = ~ sex + condition
2. design = ~ sex + condition + batch
3. design = ~ sex + condition + batch + genotype + genotype:condition

However, when considering the paired sample (each subject contain two conditions), the following all return an error which the model matrix is not full rank:

1. design = ~ subject + sex + condition
2. design = ~ subject + sex + condition + batch
3. design = ~ subject + sex + condition + batch + genotype + genotype:condition

Can someone guide me how to make correct designs? Any suggestions or recommended statistical reading would be appreciated. Thanks!

deseq2 RNA-Seq • 262 views
1
Entering edit mode
@mikelove
Last seen 4 hours ago
United States

For choosing a statistical design for your analysis, I would recommend collaborating with a local statistician. Itâ€™s a really critical part of the analysis of complex datasets, and you want to make sure you understand the interpretation of results.

0
Entering edit mode

Yes, Michael, you are right. I discussed these questions with my supervisor and we think I should simplify my statistical design.