Question: DESeq2 design matrix for timecourse
1
gravatar for axel.thieffry
2.7 years ago by
Copenhagen University, Denmark
axel.thieffry10 wrote:

 

Hi,

Being new to DESeq2 and differential expression analysis, I am a bit confused by the setting up of the design matrix of my experiment, as well as the downstream tests/contrasts.

I have:
- 3 genotypes : H, R and W. Being both exosome-related mutants, H and R are related. W is obviously wildtype.
- 3 timepoints : 0, 10 and 30 minutes.
- 2 conditions : treated and untreated. Timepoint 0 is untreated, while 10 and 30 are treated.
- 3 biological replicates per sample
Total = 24 libraries

Here is the design matrix I came up with:

         genotype timepoint replicate condition exosome
H_0_R1          H         0         1         U     mut
H_0_R2          H         0         2         U     mut
H_0_R3          H         0         3         U     mut
H_10_R1         H        10         1         T     mut
H_10_R2         H        10         2         T     mut
H_10_R3         H        10         3         T     mut
H_30_R1         H        30         1         T     mut
H_30_R2         H        30         2         T     mut
H_30_R3         H        30         3         T     mut
R_0_R1          R         0         1         U     mut
R_0_R2          R         0         2         U     mut
R_0_R3          R         0         3         U     mut
R_30_R1         R        30         1         T     mut
R_30_R2         R        30         2         T     mut
R_30_R3         R        30         3         T     mut
WT_0_R1         W         0         1         U      ok
WT_0_R2         W         0         2         U      ok
WT_0_R3         W         0         3         U      ok
WT_10_R1        W        10         1         T      ok
WT_10_R2        W        10         2         T      ok
WT_10_R3        W        10         3         T      ok
WT_30_R1        W        30         1         T      ok
WT_30_R2        W        30         2         T      ok
WT_30_R3        W        30         3         T      ok

(As you can see, I don't have the R genotype at treated timepoint 10 (i.e. R_10_R1, R2 & R3).)

My first question is: do I need to include replicates as a factor? These are just biological replicates, not independent experiments, and timepoints are not paired (i.e. WT_0_R1 is not the same biological material as WT_10_R1, which would have been sampled twice). I believe not, but I'd like confirmation.

Secondly, how should I write the design formula to answer questions such as:
- Effect of being an exosome-mutant (independent of anything else) ?
- Effect of the treatment (independent of anything else) ?
- Effect of the timecourse ?

I am very confused about (most probably very basic) concepts such as : when I want to investigate the effect of being an exosome mutant, should the analysis account for all the other factors, or on the contrary all those other factors are not to be considered at all (given they will be found in both mutant and wt)?

Thanks in advance for any help!

 

 

ADD COMMENTlink modified 2.7 years ago by Michael Love24k • written 2.7 years ago by axel.thieffry10
Answer: DESeq2 design matrix for timecourse
1
gravatar for Michael Love
2.7 years ago by
Michael Love24k
United States
Michael Love24k wrote:

You shouldn't include replicate as a factor here because there is no correspondence between rep=1,2,3 across the samples. It would only make sense it the rep 1's would related somehow, but they are not.

I think the most natural design here is ~genotype + timepoint + genotype:timepoint. The reason is that, genotype and timepoint variables explain the other variables you have specified: condition is just timepoint > 0, and exosome is just genotype != W. You can't add more variables to a design if they are linearly dependent with other variables in the design (you can think about this as "can these be constructed with linear operations from existing variables in the design?"). 

There are a number of contrasts you can perform from the above design, including using Wald tests for differences at specific times for each genotype and a likelihood ratio test for any differences in the time course across genotype. We have examples of such a time course design in our RNA-seq workflow. If you have specific questions on how to build or interpret results, you may want to partner with a local statistician who can help interpret the contrasts for you.

ADD COMMENTlink written 2.7 years ago by Michael Love24k

That is much clearer to me now, thanks for the explanations!

ADD REPLYlink written 2.7 years ago by axel.thieffry10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 122 users visited in the last hour