I am trying to use DESeq2 to simulate time course RNA seq dataset.
Let's assume we have a time course dataset with 1000 genes, 8 time points and 3 replicates per time point. Here each time point is a 'condition' amounting to a total of 8 conditions. The total number of samples are 8x3 = 24.
- Feed a count matrix of size (1000x24) into DESeq.
- Get one mean per gene per sample (hence 1000x24 = 24k mean values) and one dispersion estimate per gene (hence 1000 dispersion values).
- Take a gene (let's say BRCA1) and a given replicate (let's say replicate 1).
- Take the 8 Negative Binomial distributions corresponding to each of the 8 time points of BRCA1 replicate 1.
- Sample values from each of the 8 distributions to create new time courses for BRCA1 replicate 1.
- After doing mean/dispersion estimation, how to get the mean values(per gene per sample, mu_ij) and dispersion values (per gene, alpha_i)?
- It'd also be nice to hear from other members of the community (most of whom I believe are way more experienced in DESeq than me) their thoughts on such an approach to simulating time course RNA seq data and if anyone is aware of a better approach to simulating time courses.
DESeq model that I have referred to:
K_ij ~ NB(mu_ij, alpha_i)
mu_ij = s_j q_ij
log2(q_ij) = x_j. beta_i