Hi,
I am trying to run paired and multi factor comparison by DESeq2 and keep getting errors. I have read other posts such as DESeq2 paired multifactor test and Yet another DESeq2 nested design contrast matrix question but still cannot figure out why my design does not work.
My sample table looks like:
Treatment | Animal | Time_point |
A | m1 | d0 |
A | m2 | d0 |
A | m3 | d0 |
A | m1 | d4 |
A | m2 | d4 |
A | m3 | d4 |
A | m1 | d5 |
A | m2 | d5 |
A | m3 | d5 |
B | m1 | d0 |
B | m2 | d0 |
B | m3 | d0 |
B | m1 | d4 |
B | m2 | d4 |
B | m3 | d4 |
B | m1 | d5 |
B | m2 | d5 |
B | m3 | d5 |
C | m1 | d0 |
C | m2 | d0 |
C | m3 | d0 |
C | m1 | d4 |
C | m2 | d4 |
C | m3 | d4 |
C | m1 | d5 |
C | m2 | d5 |
C | m3 | d5 |
The same animals in the same treatment group are paired.
My questions are:
- Differences between d0 vs d5 in each treatment
- Differences between d4 vs d5 in each treatment
- Differences between d0-d5 difference of each treatment
I have tried:
- ~ Animal + Time_point + Treatment + Time_point:Treatment
- ~ Treatment + Treatment:Animal+ Treatment:Time_point
In both cases, I got an error:
"Error in DESeqDataSet(se, design = design, ignoreRank) : the model matrix is not full rank, so the model cannot be fit as specified. one or more variables or interaction terms in the design formula are linear combinations of the others and must be removed"
What would be the right design formula in this case?
Thank you in advance.
Shoko
> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base
other attached packages: [1] DESeq2_1.6.2 RcppArmadillo_0.4.500.0 Rcpp_0.11.3 [4] GenomicRanges_1.18.1 GenomeInfoDb_1.2.3 IRanges_2.0.0 [7] S4Vectors_0.4.0 BiocGenerics_0.12.1
loaded via a namespace (and not attached): [1] acepack_1.3-3.3 annotate_1.44.0 AnnotationDbi_1.28.1 [4] base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.8 [7] Biobase_2.26.0 BiocParallel_1.0.0 brew_1.0-6 [10] checkmate_1.5.0 cluster_1.15.3 codetools_0.2-9 [13] colorspace_1.2-4 DBI_0.3.1 digest_0.6.4 [16] fail_1.2 foreach_1.4.2 foreign_0.8-61 [19] Formula_1.1-2 genefilter_1.48.1 geneplotter_1.44.0 [22] ggplot2_1.0.0 grid_3.1.2 gtable_0.1.2 [25] Hmisc_3.14-5 iterators_1.0.7 lattice_0.20-29 [28] latticeExtra_0.6-26 locfit_1.5-9.1 MASS_7.3-35 [31] munsell_0.4.2 nnet_7.3-8 plyr_1.8.1 [34] proto_0.3-10 RColorBrewer_1.0-5 reshape2_1.4 [37] rpart_4.1-8 RSQLite_1.0.0 scales_0.2.4 [40] sendmailR_1.2-1 splines_3.1.2 stringr_0.6.2 [43] survival_2.37-7 tools_3.1.2 XML_3.98-1.1 [46] xtable_1.7-4 XVector_0.6.0
hi Shoko,
You are encountering this error, because you have missing samples dispersed in the cross of treatment and time, which means you need to do a bit more work to run the model. More detail on what is happening inside DESeq(): when you specify a design with Treatment:Time_point, the base R function used internally (model.matrix) in DESeq() forms an interaction between the non-base level time points and all the treatments. However, as seen in the table you posted, some of these interaction terms cannot be fit because of missing data. For example, what should be the d4 vs d0 effect for the Treatment/Group 12? There are no d4 samples for Treatment/Group 12 so this effect cannot possibly be fit. So when model.matrix is called inside DESeq(), this results in a matrix with some columns with all 0's. You can manually remove these columns and fit such a model, and then you can build results tables, except for those comparisons which are impossible to make given the experimental design (e.g. the one mentioned above).
There is not currently a solution to simply use DESeq() and supply a modified design matrix (though I hope to implement this during this devel cycle).
However, in version 1.6 you can get around this, by calling the nbinomLRT function with a custom model matrix, and then extract contrasts using results() and the name or list type for the 'contrast' argument (see the description of contrast types in ?results). This would look like:
Thanks for the reply. I am trying your code but have problems creating DESeqDataSet object. When I tried "design(dds) <- ~ Treatment + Treatment:Animal + Treatment:Time_point", I had an error "object 'dds' not found". Thus, I tried to create DESeqDataSet object by
dds <- DESeqDataSetFromMatrix(
countData = as.matrix(abundDF),
colData = metaDF,
design = formula(~ Treatment + Treatment:Animal + Treatment:Time_point)
)
However, it gives an error (that I originally had), "Error in DESeqDataSet(se, design = design, ignoreRank) : the model matrix is not full rank, so the model cannot be fit as specified. one or more variables or interaction terms in the design formula are linear combinations of the others and must be removed"
Thanks again for your help.
Shoko
hi Shoko,
Sorry, I see the problem. Just use design = ~ Treatment. It will actually be skipped in all of the steps, where the matrices are provided instead, so it doesn't matter what to put here (except you need to bypass the checks on rank). I still need to work on the steps where user supplies a custom design matrix.
Thank you for all your help. Finally I had it run and got some results. Here is the summary code I used for the future reference.
dds <- DESeqDataSetFromMatrix(
countData = as.matrix(abundDF),
colData = metaDF,
design = formula(~ Treatment)
)
mm1 <- model.matrix(~ Treatment + Treatment:Animal + Treatment:Time_point, metaDF)
idx <- which(colSums(mm1 == 0) == nrow(mm1))
mm1 <- mm1[,-idx]
mm0 <- model.matrix(~ Treatment + Treatment:Animal, metaDF)
design(dds) <- ~ Treatment + Treatment:Animal + Treatment:Time_point
dds <- estimateSizeFactors(dds)
dds <- estimateDispersionsGeneEst(dds, modelMatrix=mm1)
dds <- estimateDispersionsFit(dds)
dds <- estimateDispersionsMAP(dds, modelMatrix=mm1)
dds <- nbinomLRT(dds, full=mm1, reduced=mm0)
# Compare d5 v d0
Treatment1_d0v5 = results(dds, name="Treatment1.Time_pointd5", test="Wald")
# Compare d4 v d5
Treatment1_d4v5 = results(dds, contrast=list("Treatment1.Time_pointd5", "Treatment1.Time_pointd4"), test="Wald")
# Compare two different treatment
# Treatment1_v_Treatment2 = results(dds, contrast=list("Treatment1.Time_pointd5","Treatment2.Time_pointd5"), test="Wald")