DESeq2 design matrix for timecourse
1
1
Entering edit mode
athief ▴ 10
@axelthieffry-11787
Last seen 10 hours ago
Copenhagen

Hi,

Being new to DESeq2 and differential expression analysis, I am a bit confused by the setting up of the design matrix of my experiment, as well as the downstream tests/contrasts.

I have:
- 3 genotypes : H, R and W. Being both exosome-related mutants, H and R are related. W is obviously wildtype.
- 3 timepoints : 0, 10 and 30 minutes.
- 2 conditions : treated and untreated. Timepoint 0 is untreated, while 10 and 30 are treated.
- 3 biological replicates per sample
Total = 24 libraries

Here is the design matrix I came up with:

         genotype timepoint replicate condition exosome
H_0_R1          H         0         1         U     mut
H_0_R2          H         0         2         U     mut
H_0_R3          H         0         3         U     mut
H_10_R1         H        10         1         T     mut
H_10_R2         H        10         2         T     mut
H_10_R3         H        10         3         T     mut
H_30_R1         H        30         1         T     mut
H_30_R2         H        30         2         T     mut
H_30_R3         H        30         3         T     mut
R_0_R1          R         0         1         U     mut
R_0_R2          R         0         2         U     mut
R_0_R3          R         0         3         U     mut
R_30_R1         R        30         1         T     mut
R_30_R2         R        30         2         T     mut
R_30_R3         R        30         3         T     mut
WT_0_R1         W         0         1         U      ok
WT_0_R2         W         0         2         U      ok
WT_0_R3         W         0         3         U      ok
WT_10_R1        W        10         1         T      ok
WT_10_R2        W        10         2         T      ok
WT_10_R3        W        10         3         T      ok
WT_30_R1        W        30         1         T      ok
WT_30_R2        W        30         2         T      ok
WT_30_R3        W        30         3         T      ok

(As you can see, I don't have the R genotype at treated timepoint 10 (i.e. R_10_R1, R2 & R3).)

My first question is: do I need to include replicates as a factor? These are just biological replicates, not independent experiments, and timepoints are not paired (i.e. WT_0_R1 is not the same biological material as WT_10_R1, which would have been sampled twice). I believe not, but I'd like confirmation.

Secondly, how should I write the design formula to answer questions such as:
- Effect of being an exosome-mutant (independent of anything else) ?
- Effect of the treatment (independent of anything else) ?
- Effect of the timecourse ?

I am very confused about (most probably very basic) concepts such as : when I want to investigate the effect of being an exosome mutant, should the analysis account for all the other factors, or on the contrary all those other factors are not to be considered at all (given they will be found in both mutant and wt)?

Thanks in advance for any help!

2
Entering edit mode
@mikelove
Last seen 1 minute ago
United States

You shouldn't include replicate as a factor here because there is no correspondence between rep=1,2,3 across the samples. It would only make sense it the rep 1's would related somehow, but they are not.

I think the most natural design here is ~genotype + timepoint + genotype:timepoint. The reason is that, genotype and timepoint variables explain the other variables you have specified: condition is just timepoint > 0, and exosome is just genotype != W. You can't add more variables to a design if they are linearly dependent with other variables in the design (you can think about this as "can these be constructed with linear operations from existing variables in the design?").

There are a number of contrasts you can perform from the above design, including using Wald tests for differences at specific times for each genotype and a likelihood ratio test for any differences in the time course across genotype. We have examples of such a time course design in our RNA-seq workflow. If you have specific questions on how to build or interpret results, you may want to partner with a local statistician who can help interpret the contrasts for you.

0
Entering edit mode

That is much clearer to me now, thanks for the explanations!