Question

Design of time-course experiment in DESeq2

0

Entering edit mode

ac_bioinformatics • 0

@ac_bioinformatics-22706

Last seen 6.1 years ago

I am having issues with the design fomula for DESeq2 analysis of time-course data. I have a pilot experiment with 6 samples from blood, 3 of each condition, no replicates, and at the time-points of 0, 12 and 24 hrs. I've gathered from the DESeq2 vignette and reading forums that I should be using the likelihood ratio test for this type of analysis but I'm unsure what design formula to specify. The questions I want to address are:

What are the differences in gene expression between timepoints 0 vs 12, 12 vs 24 and 0 vs 24hrs.
What are the condition-specific differences between gene expression between all the timepoint comparisons.
Is there a differences between gene expression at different time-points in a model which does not take into account cell numbers per sample versus a model with cell numbers included as additional variables?

An example of my phenotype data is:

      timepoint condition  granulocytes    lymphocytes     monocytes
s1          0       con1      60             36              3 
s2         12       con1      46             47              5 
s3         24       con1      33             59              6
s4          0       con2      61             34              3   
s5         12       con2      49             50              6    
s6         24       con2      30             60              7

And my code looks like this:

ddsMat <- DESeqDataSetFromMatrix(countData = counts, 
                                 colData = pheno, 
                                 design = ~ condition + timepoint)

ddsMat2 <- DESeqDataSetFromMatrix(countData = counts, 
                                 colData = pheno, 
                                 design = ~ condition + timepoint + granulocytes + lymphocytes + monocytes)

ddsTC <- DESeq(ddsMat, test="LRT", reduced = ~ timepoint)

ddsTC2 <- DESeq(ddsMat2, test="LRT", reduced = ~ timepoint)

t0vs11 <- results(ddsTC, contrast=c("timepoint","12","0"), alpha=.05, test="Wald")
t11vs24 <- results(ddsTC, contrast=c("timepoint","24","12"), alpha=.05, test="Wald")
con1vscon2 <- results(ddsTC, contrast=c("condition","con1","con2"), alpha=.05, test="Wald")

t0vs11 <- results(ddsTC2, contrast=c("timepoint","12","0"), alpha=.05, test="Wald")
t11vs24 <- results(ddsTC2, contrast=c("timepoint","24","12"), alpha=.05, test="Wald")
con1vscon2 <- results(ddsTC2, contrast=c("condition","con1","con2"), alpha=.05, test="Wald")

For the purposes of this experiment I can consider the 2 conditions as different replicates of the same sample, if the analysis cannot be done without replicates, and as I am planning on repeating this with replicates in the future so will need to understand what design formula to use for that instance.

Thanks in advance for help!

deseq2 design design formula time course lrt • 1.8k views

ADD COMMENT • link updated 6.1 years ago by Michael Love 43k • written 6.1 years ago by ac_bioinformatics • 0

score 0 · Answer 1 · 2020-01-14

You have too few samples to answer a number of your questions - and to understand this more completely, I'd strongly recommend consulting with a statistician or someone familiar with setting up linear models in R.

You can certainly answer (1), but you have no replicates to answer (2). Again, to understand why, please consult with a statistician.

When you put continuous covariates into the design formula, it makes a strong assumption that the log counts will be linear against your covariate. If you think that's appropriate, you can add a continuous covariates, but you have 6 samples, and 4 coefficients with time and condition alone. So you can only add one more covariate to have the minimal number of samples to run the model.