Question: DESeq2 model design with unequal number of time points
0
2.7 years ago by
Dmitry0
Dmitry0 wrote:

Dear community,

I am having troubles with the DESeq2 analysis of time course experiments and was thinking someone could point me to the right direction.

I have 3 different conditions. For conditions 1 and 2 there are time points 0, 8, 16, 24, 48, 72 hours. For condition 3 there are time points 0, 8, 24, 48, 72, 96, 120 hours.

While trying to design a model

dds <-DESeqDataSet(dataset, ~condition + time + condition:time) as described here (in the time course section), DESeq2 throws an error

"..the model matrix is not full rank, so the model cannot be fit as specified.
Levels or combinations of levels without any samples have resulted in
column(s) of zeros in the model matrix".

I guess this is because some of the conditions do not have all the time points?

I tried the following:

model <- model.matrix(~condition + time + condition:time, colData(data_dds))

and removing the columns that have only zeros, but since I do not see all the conditions in this model matrix (the first one is always hidden), supplying this matrix to DESeqDataSet gives the same error.

Is there a way to make such a model?

Are there any alternative ways to test condition-specific and time/timepoint-specific effects?

deseq2 R multiple time points • 875 views
modified 2.7 years ago by Ryan C. Thompson7.4k • written 2.7 years ago by Dmitry0
1

If you have a specific 'shape' of time-profile you want to investigate, then maybe using time as a continuous variable would allow you to compensate for the 'missing' timepoints.  If you kept the time term linear, then you may be able to detect genes that have differential gradient (on a log-like scale).  Translating the time-variable by a specific offset would mean the intercept term would be the difference between the fitted curves at the timepoint corresponding to the (negative) offset, effecting interpolation.  I've tried this unsuccessfully in the past, in cases with fewer timepoints, but it might be worth try.

Yes, I was considering to use time as a continuous variable. But since I am just starting learning bioinformatics, I am still not knowledgeable enogh to do this on my own:) Maybe I'll try if I find a good tutorial.

Answer: DESeq2 model design with unequal number of time points
1
2.7 years ago by
Scripps Research, La Jolla, CA
Ryan C. Thompson7.4k wrote:

Yes, the reason this is happening is because of the missing/mismatched time points. As you point out, when working directly with a design matrix, you can manually remove the columns of all zeros and still get a functional design matrix, although I haven't tried this in a design with interaction terms, and it's possible that doing so will change the interpretation of some of the coefficients in unexpected ways.

There are ways to trick DESeq2 into using your custom design matrix, but in my opinion, the most straightforward way to fit this model is to make a single factor called "group" that is the concatenation of condition and time: group <- factor(paste(condition, time, sep=".")). Then use ~group as your design, and form all the contrasts that you want to test explicitly.

Keep in mind that because the time points are different, you can't always directly compare results between conditions. For example, if you found 100 DE genes changing with time in condition 1 and only 50 genes changing with time in condition 2, you wouldn't be able to conclude that condition 1 had more DE genes, unless you restricted your tests to only the common time points.

Thank you for a possible solution, Ryan! I am going to try it. The reason I am keeping all the time points is to look for changes in gene expression within one condition. But of course, for comparison between all 3 conditions I am going to use only matching time points.

One more thing I should mention: I believe DESeq2 automatically sets betaPrior=FALSE for interaction models. Since this is definitely an interaction model, you should set betaPrior=FALSE manually, since there's no way for DESeq2 to know that it's an interaction model when you concatenate the two factors into a single "group" variable.

A few notes here, this is an area that is not rigorously explained anywhere in my documentation I think. The issue with interaction terms and priors is only when the model matrix contains the interaction. It wouldn't be a problem for a model matrix formed with ~group.

That said, in the next release (April 2017), the DESeq() function will by default have betaPrior=FALSE, and then log fold change moderation will be moved to a separate function lfcShrink(), where you could either specify the name/number of a coefficient to shrink, or a contrast. We also will have some interesting new estimators available through lfcShrink(), maybe for testing in the next devel cycle.