Hi,
I am finding an error with estimateDispersions() in DESeq2.
In my experimental design I have 3 factors: Treatment, Time and Depth
I start with a phyloseq object (PHYLUM) that I convert to DESeq (Phylum)
easily, including two of the factors in the design.
> Phylum=phyloseq_to_deseq2(PHYLUM,~ Time + Treatment)
> Phylum
class: DESeqDataSet
dim: 57 30
exptData(0):
assays(1): counts
rownames(57): OTU1 OTU2 ... OTU56 OTU57
rowData metadata column names(0):
colnames(30): T2_SF5C T25_SF1B ... T2_DP1A T2_DP1C
colData names(3): Treatment Time Depth
> colData(Phylum)
DataFrame with 30 rows and 3 columns
Treatment Time Depth
<factor> <factor> <factor>
T2_SF5C DOMNP 02d 25m
T25_SF1B C 25d 25m
T25_SF5B DOMNP 25d 25m
T25_SF4A DOM 25d 25m
T2_SF4A DOM 02d 25m
... ... ... ...
T25_DP5A DOMNP 25d 125m
T25_DP4B DOM 25d 125m
T25_DP5B DOMNP 25d 125m
T2_DP1A C 02d 125m
T2_DP1C C 02d 125m
All contrasts work fine with this design (~ Time + Treatment).
The problem comes when I try to perform interactions. I find and error when estimating dispersions (see below). Do you know how can I solve it? Is it related to the nature of my factors?
I have seen a previous post in which the same error was discussed but in that case the problem was the opposite: the error occurred with the normal contrast and was solved in the interaction. I am using DESeq 1.4.5.
Thanks so much!!!!
> design(Phylum) <- ~ Time + Treatment + Time:Treatment
> Phylum<-estimateSizeFactors(Phylum)
> Phylum<-estimateDispersions(Phylum,fitType="local")
gene-wise dispersion estimates
error: inv(): matrix appears to be singular
Error: inv(): matrix appears to be singular
Thanks for reporting this. The issue is that your model cannot be fit as specified (with the interaction term) because you only have control samples at the first time point. The reason you get this not-so-nice error message is that DESeq2 runs a test during dds object construction on the design, and then prints a nice message explaining this problem. I've now added another test for this problem at the estimateDispersions() step with a nice message (rather than "matrix appears to be singular"), in case the design has changed since object construction.
Here's how you can explore your model and some explanation on why it cannot be fit as specified:
The sum of the interaction terms for treatment DOM with time 2d and 25d are equal to the main effect for treatment DOM. This would not be the case if you had samples at time 0d for treatment DOM. Note that these interaction terms can be thought of as "how is the effect of DOM different at time 2d than at time 0d", so it makes sense that the model cannot be fit with these interaction terms, because the experiment didn't measure the effect of DOM at time 0d. A solution is to remove the interaction terms from the design.
Thanks Mike,
First, thanks so much for your help. This website is really good for every DESEQ user.
Regarding my problem: Now I understand.
Following your explanation, if I take out day 0, my dataset would allow to make comparisons between C, DOM and DOMNP in 02 and 25d right?
I do not have samples at time 0d for the two addition treatments because they are supposed (by definition of our experimental design) to be similar to the control (at least that is our assumption). Maybe what I can do is to "create" one sample DOM_00d and another DOMNP_00d and copy the data I have for C_00d. These samples will not be real but will represent our ecological assumptions. in our statistical test.
If I remove the interaction term as you suggest I will not be able to separate the general effect of treatments between the different sampling points (the effect of treatment is supposed to change at 25d compared to the effect at 02d), and that is important for us.
Also, I would like to perform a "time-series" analysis (similar to a RMANOVA test) so I think I do need to keep the interaction on my design...
What do you think?
Easiest is to just remove the two time 0 samples. You don't get much from them, only one degree of freedom for estimating dispersion, whereas you already have many residual degrees of freedom. And these two samples don't help you investigate the treatment effect or how it changes from 2d to 25d.
(The problem with using a design Time + Time:Treatment is that R's base function model.matrix -- which we make use of inside the DESeq2 functions -- will add a column of 0's to the model matrix. I can try to work on this but not in this release cycle.)
Thanks Mike,
I am going to try and figure out how to perform the RMANOVA, any advises?
For genes which vary over time due to treatment, use the likelihood ratio test (see vignette):
Hi Mike, thanks so much. I have now tried 2 different approaches: the Time-series approach (OPCION A) and the Time+Treatment interaction approach (OPTION B). I am not sure I understand completely the difference in the results I get. Maybe I do not need a Time-series design. Hope you can help me.
My main questions for this dataset are:
Which is the effect of the different treatments in general (all time points included)?
Which is the effect of the different treatments at each different time?
Which is the difference between DOM and DOMNP in general and in each time point?
Which is the effect of time?
This was my dataset:
Phylum=phyloseq_to_deseq2(PHYLUM,~ Time + Treatment + Time:Treatment)
OPCION A: TME-SERIES DESIGN
> Phylum= DESeq(Phylum, fitType="local", test="LRT", reduced= ~ Time + Treatment)
> results(Phylum)
log2 fold change: Time25d.TreatmentDOMNP
LRT p-value: '~ Time + Treatment + Time:Treatment' vs '~ Time + Treatment'
DataFrame with 231 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
OTU1 6.671043 1.284642 2.088933 1.6123348 0.8065735 0.9994033
OTU2 2.203649 1.425796 1.793131 2.2630540 0.6875035 0.9994033
Does the results() above test all times after Time 0 at once (that is the general effect of treatment during the complete incubation)?
Why does it say just time 25d?
How can I change it to test for the other level of Treatment (DOM)?
And to test DOM vs DOMNP?
Now I call resultsNames()
>resultsNames(Phylum)
#[1] "Intercept" "Time_02d_vs_00d" "Time_25d_vs_00d" "Treatment_DOM_vs_C" "Treatment_DOMNP_vs_C"
#[6] "Time02d.TreatmentDOM" "Time25d.TreatmentDOM" "Time02d.TreatmentDOMNP" "Time25d.TreatmentDOMNP"
- On one hand I have the main results:
The effects of the different times over the initial time, e.j:
results(Phylum, name="Time_02d_vs_00d")
and the effect for treatments over Control at time 0, ej:
results(Phylum, name="Treatment_DOM_vs_C")
Is this correct?
- On the other hand the following will return a results table with Wald tests of the additional treatment effect in time 25 (additional beyond the treatment effect at time 0)
Phylum <- nbinomWaldTest(Phylum, betaPrior=FALSE)
results(Phylum, name="Time25d.TreatmentDOM")
Is this similar to the first LRT results above, except now we are asking for a different effect of Treatment in a specific time, not in all times?
The problem is that it does not give me an option to test the difference between DOM and DOMNP, right?
I thought that a time series approach would be appropriate since they are the same individuals sampled at different times...but I am starting to think that making a interaction design would be more appropriate to get all the specific interactions between the different treatments
OPTION B: INTERACTION DESIGN
Phylum = DESeq(Phylum, fitType="local")
results(Phylum)
log2 fold change (MAP): Time25d.TreatmentDOMNP
Wald test p-value: Time25d.TreatmentDOMNP
DataFrame with 231 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
OTU1 6.671043 0.26544913 0.2541302 1.04453979 0.2962357 0.9216456
OTU2 2.203649 0.25048700 0.2572755 0.97361374 0.3302484 0.9418194
This test above tests all times after Time 0 at once (that is the general effect of treatment during the complete incubation)?
Why does it say just time 25d?
How can I change it to test for the other level of Treatment (DOM)?
If I call resultsNames() I can make much more combinations:
resultsNames(Phylum)
#[1] "Intercept" "Time00d" "Time02d" "Time25d" "TreatmentC"
#[6] "TreatmentDOM" "TreatmentDOMNP" "Time00d.TreatmentC" "Time02d.TreatmentC" "Time25d.TreatmentC"
#[11] "Time00d.TreatmentDOM" "Time02d.TreatmentDOM" "Time25d.TreatmentDOM" "Time00d.TreatmentDOMNP" "Time02d.TreatmentDOMNP"
#[16] "Time25d.TreatmentDOMNP"
Can I test for the general effect of a treatment (DOM) in all time points like this?
res <- results(Phylum, contrast =list("TreatmentC","TreatmentDOM"))
Can I test for the effect of a treatment (DOM) in a specific time point (25d) like this?
res <- results(Phylum, contrast =list("Time25d.TreatmentC","Time25d.TreatmentDOM"))
And for the effect of time in an specific treatment(DOM) like this?
res <- results(Phylum, contrast =list("Time02d.TreatmentDOM","Time25d.TreatmentDOM"))
I am not sure that the time-series approach (if I am using it correctly) is adding too much info to my analysis, in fact, it decreases the amount of contrasts I can make. Am I right?
Thanks so much for your help and for all you patience
Sandra
I'd prefer to just give recommendations for how to investigate your biological questions of interest. There are many ways to code up and run an analysis.
Firstly, you ask "Why does it say just time 25d?". The answer to this is in the help page for results(). Help for R functions is easily accessible by typing ?results in the R console:
Now to address your main biological questions:
"My main questions for this dataset are:
Which is the effect of the different treatments in general (all time points included)?
Which is the effect of the different treatments at each different time?
Which is the difference between DOM and DOMNP in general and in each time point?
Which is the effect of time?"
The easiest way for you to investigate these various comparisons is using the likelihood ratio test. If you are not familiar with the purpose or mechanics of the likelihood ratio test, I highly recommend talking to a local statistician. We explain it briefly in the vignette and help files, but it's best to have a person who you can talk to face-to-face.
I thought of a way for you to code your variables so that you can be specific about which interaction terms you want, and so that you can include the time 0 samples which didn't have treatment.
Using the full dataset which includes time 0, recode the treatment variables like so (I'll use dds as the dataset object):
and do the same for the timepoints after time 0:
Now set the design as:
You can then estimate size factors and dispersions:
Now you can remove terms in order to investigate different questions. To find genes which change due to treatment of DOM or DOMP at time 25, remove the interaction terms for the treatments at day 25. Remember: interaction terms, for example DOM:day25, model an additional effect for DOM at day 25 which is different than the effect at day 2.
If you want to find genes which change due to treatment of DOMP, remove the terms containing DOMP:
To find genes which change due to either DOM or DOMP treatment, remove all the terms with DOM or DOMP in them (including the interaction terms, e.g. DOM:day25). Similarly for genes which change due to time, remove the terms with day02 or day25 in them.