Design of time-course experiment in DESeq2
Entering edit mode
Last seen 2.4 years ago

I am having issues with the design fomula for DESeq2 analysis of time-course data. I have a pilot experiment with 6 samples from blood, 3 of each condition, no replicates, and at the time-points of 0, 12 and 24 hrs. I've gathered from the DESeq2 vignette and reading forums that I should be using the likelihood ratio test for this type of analysis but I'm unsure what design formula to specify. The questions I want to address are:

  1. What are the differences in gene expression between timepoints 0 vs 12, 12 vs 24 and 0 vs 24hrs.
  2. What are the condition-specific differences between gene expression between all the timepoint comparisons.
  3. Is there a differences between gene expression at different time-points in a model which does not take into account cell numbers per sample versus a model with cell numbers included as additional variables?

An example of my phenotype data is:

      timepoint condition  granulocytes    lymphocytes     monocytes
s1          0       con1      60             36              3 
s2         12       con1      46             47              5 
s3         24       con1      33             59              6
s4          0       con2      61             34              3   
s5         12       con2      49             50              6    
s6         24       con2      30             60              7

And my code looks like this:

ddsMat <- DESeqDataSetFromMatrix(countData = counts, 
                                 colData = pheno, 
                                 design = ~ condition + timepoint)

ddsMat2 <- DESeqDataSetFromMatrix(countData = counts, 
                                 colData = pheno, 
                                 design = ~ condition + timepoint + granulocytes + lymphocytes + monocytes)

ddsTC <- DESeq(ddsMat, test="LRT", reduced = ~ timepoint)

ddsTC2 <- DESeq(ddsMat2, test="LRT", reduced = ~ timepoint)

t0vs11 <- results(ddsTC, contrast=c("timepoint","12","0"), alpha=.05, test="Wald")
t11vs24 <- results(ddsTC, contrast=c("timepoint","24","12"), alpha=.05, test="Wald")
con1vscon2 <- results(ddsTC, contrast=c("condition","con1","con2"), alpha=.05, test="Wald")

t0vs11 <- results(ddsTC2, contrast=c("timepoint","12","0"), alpha=.05, test="Wald")
t11vs24 <- results(ddsTC2, contrast=c("timepoint","24","12"), alpha=.05, test="Wald")
con1vscon2 <- results(ddsTC2, contrast=c("condition","con1","con2"), alpha=.05, test="Wald")

For the purposes of this experiment I can consider the 2 conditions as different replicates of the same sample, if the analysis cannot be done without replicates, and as I am planning on repeating this with replicates in the future so will need to understand what design formula to use for that instance.

Thanks in advance for help!

deseq2 design design formula time course lrt • 304 views
Entering edit mode
Last seen 1 day ago
United States

You have too few samples to answer a number of your questions - and to understand this more completely, I'd strongly recommend consulting with a statistician or someone familiar with setting up linear models in R.

You can certainly answer (1), but you have no replicates to answer (2). Again, to understand why, please consult with a statistician.

When you put continuous covariates into the design formula, it makes a strong assumption that the log counts will be linear against your covariate. If you think that's appropriate, you can add a continuous covariates, but you have 6 samples, and 4 coefficients with time and condition alone. So you can only add one more covariate to have the minimal number of samples to run the model.

Entering edit mode

Thanks for your reply! I thought the lack of samples might be the issue, but wanted to check if there was any way to answer these questions regardless.


Login before adding your answer.

Traffic: 665 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6