DESeq2 design matrix for timecourse
1
1
Entering edit mode
athief ▴ 20
@axelthieffry-11787
Last seen 19 months ago
Copenhagen

 

Hi,

Being new to DESeq2 and differential expression analysis, I am a bit confused by the setting up of the design matrix of my experiment, as well as the downstream tests/contrasts.

I have:
- 3 genotypes : H, R and W. Being both exosome-related mutants, H and R are related. W is obviously wildtype.
- 3 timepoints : 0, 10 and 30 minutes.
- 2 conditions : treated and untreated. Timepoint 0 is untreated, while 10 and 30 are treated.
- 3 biological replicates per sample
Total = 24 libraries

Here is the design matrix I came up with:

         genotype timepoint replicate condition exosome
H_0_R1          H         0         1         U     mut
H_0_R2          H         0         2         U     mut
H_0_R3          H         0         3         U     mut
H_10_R1         H        10         1         T     mut
H_10_R2         H        10         2         T     mut
H_10_R3         H        10         3         T     mut
H_30_R1         H        30         1         T     mut
H_30_R2         H        30         2         T     mut
H_30_R3         H        30         3         T     mut
R_0_R1          R         0         1         U     mut
R_0_R2          R         0         2         U     mut
R_0_R3          R         0         3         U     mut
R_30_R1         R        30         1         T     mut
R_30_R2         R        30         2         T     mut
R_30_R3         R        30         3         T     mut
WT_0_R1         W         0         1         U      ok
WT_0_R2         W         0         2         U      ok
WT_0_R3         W         0         3         U      ok
WT_10_R1        W        10         1         T      ok
WT_10_R2        W        10         2         T      ok
WT_10_R3        W        10         3         T      ok
WT_30_R1        W        30         1         T      ok
WT_30_R2        W        30         2         T      ok
WT_30_R3        W        30         3         T      ok

(As you can see, I don't have the R genotype at treated timepoint 10 (i.e. R_10_R1, R2 & R3).)

My first question is: do I need to include replicates as a factor? These are just biological replicates, not independent experiments, and timepoints are not paired (i.e. WT_0_R1 is not the same biological material as WT_10_R1, which would have been sampled twice). I believe not, but I'd like confirmation.

Secondly, how should I write the design formula to answer questions such as:
- Effect of being an exosome-mutant (independent of anything else) ?
- Effect of the treatment (independent of anything else) ?
- Effect of the timecourse ?

I am very confused about (most probably very basic) concepts such as : when I want to investigate the effect of being an exosome mutant, should the analysis account for all the other factors, or on the contrary all those other factors are not to be considered at all (given they will be found in both mutant and wt)?

Thanks in advance for any help!

 

 

deseq2 experimental design timecourse rnaseq model • 1.8k views
ADD COMMENT
3
Entering edit mode
@mikelove
Last seen 13 hours ago
United States

You shouldn't include replicate as a factor here because there is no correspondence between rep=1,2,3 across the samples. It would only make sense it the rep 1's would related somehow, but they are not.

I think the most natural design here is ~genotype + timepoint + genotype:timepoint. The reason is that, genotype and timepoint variables explain the other variables you have specified: condition is just timepoint > 0, and exosome is just genotype != W. You can't add more variables to a design if they are linearly dependent with other variables in the design (you can think about this as "can these be constructed with linear operations from existing variables in the design?"). 

There are a number of contrasts you can perform from the above design, including using Wald tests for differences at specific times for each genotype and a likelihood ratio test for any differences in the time course across genotype. We have examples of such a time course design in our RNA-seq workflow. If you have specific questions on how to build or interpret results, you may want to partner with a local statistician who can help interpret the contrasts for you.

ADD COMMENT
1
Entering edit mode

That is much clearer to me now, thanks for the explanations!

ADD REPLY

Login before adding your answer.

Traffic: 551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6