I've tried calculating fitted values from the output of DESeq in two ways that I thought should produce identical results; however, this doesn't seem to be the case.
obj <- DESeq(obj) q_ij <- t( t( assays(obj)[["mu"]] ) / sizeFactors(obj) ) mod <- attr(obj, "modelMatrix") coefs <- coef(obj) q_tst <- coefs %*% t(mod) summary(as.vector(q_tst-log2(q_ij))) Min. 1st Qu. Median Mean 3rd Qu. Max. -6.8360 0.0000 0.0000 -0.1278 0.0000 0.0000 plot(x=(log2(q_ij)+q_tst)/2, y=log2(q_ij)-q_tst)
I've also tried this with betaPrior=FALSE
and minReplicatesForReplace=Inf
and the two fitted value calculations were still not equal. Am I missing something obvious here? In case it's helpful I've uploaded the obj
variable from my example code here: https://dl.dropboxusercontent.com/u/384142/deseq2-test-object.rda
> sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: macOS Sierra 10.12.1 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base other attached packages: [1] RColorBrewer_1.1-2 gplots_3.0.1 DESeq2_1.14.1 SummarizedExperiment_1.4.0 Biobase_2.34.0 [6] GenomicRanges_1.26.1 GenomeInfoDb_1.10.1 IRanges_2.8.1 S4Vectors_0.12.1 BiocGenerics_0.20.0 [11] sva_3.22.0 genefilter_1.56.0 mgcv_1.8-16 nlme_3.1-128 loaded via a namespace (and not attached): [1] gtools_3.5.0 locfit_1.5-9.1 splines_3.3.2 lattice_0.20-34 colorspace_1.3-1 htmltools_0.3.5 survival_2.40-1 [8] XML_3.98-1.5 foreign_0.8-67 DBI_0.5-1 BiocParallel_1.8.1 plyr_1.8.4 stringr_1.1.0 zlibbioc_1.20.0 [15] munsell_0.4.3 gtable_0.2.0 caTools_1.17.1 memoise_1.0.0 latticeExtra_0.6-28 knitr_1.15.1 geneplotter_1.52.0 [22] AnnotationDbi_1.36.0 htmlTable_1.7 Rcpp_0.12.8 KernSmooth_2.23-15 acepack_1.4.1 xtable_1.8-2 openssl_0.9.5 [29] scales_0.4.1 gdata_2.17.0 base64_2.0 Hmisc_4.0-1 annotate_1.52.0 XVector_0.14.0 gridExtra_2.2.1 [36] ggplot2_2.2.0 digest_0.6.10 stringi_1.1.2 grid_3.3.2 tools_3.3.2 bitops_1.0-6 magrittr_1.5 [43] RCurl_1.95-4.8 lazyeval_0.2.0 RSQLite_1.1-1 tibble_1.2 Formula_1.2-1 cluster_2.0.5 Matrix_1.2-7.1 [50] data.table_1.10.0 assertthat_0.1 rpart_4.1-10 nnet_7.3-12
Thank you for the fast reply. So with betaPrior=FALSE, I still get some sizable differences.
Now with betaPrior=FALSE, some rows don't converge. If I remove those, there are only a few differences between the fitted value calculations, but they aren't decimal dust.
One thing I noticed is that the differences are only in rows that converged in beta but took the maximum iterations to do so:
Ah, I think that's because I'm not doing a final update for mu for those rows that undergo numeric optimization because they don't converge with the IRLS. So the better mu for those rows is the one you are creating by multiplying coefficients. I can fix this in devel.
Ok, I've fixed this in v1.15.25. Now the differences should be within some typical numeric tolerance.
Got it. Thanks!
One related question: is there a way to calculate the fitted values with
betaPrior=TRUE
using the coefficients and model matrix rather thanassays(obj)[["mu"]]
and the size factors?If you want the "shrunken" mu's (so they will be closer to each other than the MLE mu's), you can just multiply the coef matrix by the model matrix and scale up by size factors. I took a look at the mu code just now and made it a bit more consistent (it was complicated and somewhat inconsistent by my having allowed users to run functions without necessarily having gone through expected previous steps, e.g. users providing external dispersion values instead of using estimateDispersions...).
Going forward, the matrix "mu" in assays(dds) will always be the MLE mu's.
Great. Thank you!