Question: DESeq fitted values using coefficients vs mu
3
2.8 years ago by
United States
Matthew McCall830 wrote:

I've tried calculating fitted values from the output of DESeq in two ways that I thought should produce identical results; however, this doesn't seem to be the case.

obj <- DESeq(obj)
q_ij <- t( t( assays(obj)[["mu"]] ) / sizeFactors(obj) )
mod <- attr(obj, "modelMatrix")
coefs <- coef(obj)
q_tst <- coefs %*% t(mod)
summary(as.vector(q_tst-log2(q_ij)))

Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-6.8360  0.0000  0.0000 -0.1278  0.0000  0.0000

plot(x=(log2(q_ij)+q_tst)/2, y=log2(q_ij)-q_tst)

I've also tried this with betaPrior=FALSE and minReplicatesForReplace=Inf and the two fitted value calculations were still not equal. Am I missing something obvious here? In case it's helpful I've uploaded the obj variable from my example code here: https://dl.dropboxusercontent.com/u/384142/deseq2-test-object.rda

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] RColorBrewer_1.1-2         gplots_3.0.1               DESeq2_1.14.1              SummarizedExperiment_1.4.0 Biobase_2.34.0
[6] GenomicRanges_1.26.1       GenomeInfoDb_1.10.1        IRanges_2.8.1              S4Vectors_0.12.1           BiocGenerics_0.20.0
[11] sva_3.22.0                 genefilter_1.56.0          mgcv_1.8-16                nlme_3.1-128

loaded via a namespace (and not attached):
[1] gtools_3.5.0         locfit_1.5-9.1       splines_3.3.2        lattice_0.20-34      colorspace_1.3-1     htmltools_0.3.5      survival_2.40-1
[8] XML_3.98-1.5         foreign_0.8-67       DBI_0.5-1            BiocParallel_1.8.1   plyr_1.8.4           stringr_1.1.0        zlibbioc_1.20.0
[15] munsell_0.4.3        gtable_0.2.0         caTools_1.17.1       memoise_1.0.0        latticeExtra_0.6-28  knitr_1.15.1         geneplotter_1.52.0
[22] AnnotationDbi_1.36.0 htmlTable_1.7        Rcpp_0.12.8          KernSmooth_2.23-15   acepack_1.4.1        xtable_1.8-2         openssl_0.9.5
[29] scales_0.4.1         gdata_2.17.0         base64_2.0           Hmisc_4.0-1          annotate_1.52.0      XVector_0.14.0       gridExtra_2.2.1
[36] ggplot2_2.2.0        digest_0.6.10        stringi_1.1.2        grid_3.3.2           tools_3.3.2          bitops_1.0-6         magrittr_1.5
[43] RCurl_1.95-4.8       lazyeval_0.2.0       RSQLite_1.1-1        tibble_1.2           Formula_1.2-1        cluster_2.0.5        Matrix_1.2-7.1
[50] data.table_1.10.0    assertthat_0.1       rpart_4.1-10         nnet_7.3-12     

deseq2 • 759 views
modified 2.8 years ago by Michael Love25k • written 2.8 years ago by Matthew McCall830
Answer: DESeq fitted values using coefficients vs mu
4
2.8 years ago by
Michael Love25k
United States
Michael Love25k wrote:

hi Matthew,

If you use betaPrior=FALSE, they should be very close. I get 2e-15 with some simulated data:

library(DESeq2)
set.seed(1)
dds <- makeExampleDESeqDataSet()
dds <- dds[rowSums(counts(dds)) > 0,]
dds <- DESeq(dds, betaPrior=FALSE)
log2q1 <- log2(t(t(assays(dds)[["mu"]])/sizeFactors(dds)))
log2q2 <- coef(dds) %*% t(attr(dds, "modelMatrix"))
dev <- log2q1[,1] - log2q2[,1]
summary(abs(dev))

Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
0.000e+00 0.000e+00 0.000e+00 8.184e-17 0.000e+00 1.776e-15

> packageVersion("DESeq2")
[1] ‘1.14.0’


I see equal deviations from zero per gene (for a gene, all samples have the same deviation if it's nonzero) which I think corresponds to the intercept changing by small amounts. I think what is going on in terms of the difference is the conversion of coefficients from log scale to log2 scale which happens internally.

1

Thank you for the fast reply. So with betaPrior=FALSE, I still get some sizable differences.

obj <- DESeq(obj, betaPrior=FALSE)
log2q1 <- log2(t( t( assays(obj)[["mu"]] ) / sizeFactors(obj) ))
log2q2 <- coef(obj) %*% t(attr(obj, "modelMatrix"))
summary(as.vector(log2q1-log2q2))

Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.0000  0.0000  0.0000  0.5133  0.0000 34.5000 

Now with betaPrior=FALSE, some rows don't converge. If I remove those, there are only a few differences between the fitted value calculations, but they aren't decimal dust.

ind <- which(mcols(obj)\$betaConv)
summary(as.vector(log2q1[ind,]-log2q2[ind,]))

Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
0.000000 0.000000 0.000000 0.000895 0.000000 1.320000 

One thing I noticed is that the differences are only in rows that converged in beta but took the maximum iterations to do so:

which(log2q1[ind,]-log2q2[ind,] > 0.01, arr.ind=TRUE)

row col
hsa-let-7b-3p   3 424
hsa-let-7b-3p   3 426
hsa-let-7b-3p   3 435
hsa-let-7a-3p   1 436
hsa-let-7b-3p   3 436
hsa-let-7b-3p   3 437
hsa-let-7b-3p   3 438

mcols(obj)[ind,326:327]

DataFrame with 7 rows and 2 columns
betaConv  betaIter
<logical> <numeric>
1      TRUE       100
2      TRUE         6
3      TRUE       100
4      TRUE        17
5      TRUE        21
6      TRUE         5
7      TRUE        15
1

Ah, I think that's because I'm not doing a final update for mu for those rows that undergo numeric optimization because they don't converge with the IRLS. So the better mu for those rows is the one you are creating by multiplying coefficients. I can fix this in devel.

1

Ok, I've fixed this in v1.15.25. Now the differences should be within some typical numeric tolerance.

Got it. Thanks!

1

One related question: is there a way to calculate the fitted values with betaPrior=TRUE using the coefficients and model matrix rather than assays(obj)[["mu"]] and the size factors?

1

If you want the "shrunken" mu's (so they will be closer to each other than the MLE mu's), you can just multiply the coef matrix by the model matrix and scale up by size factors. I took a look at the mu code just now and made it a bit more consistent (it was complicated and somewhat inconsistent by my having allowed users to run functions without necessarily having gone through expected previous steps, e.g. users providing external dispersion values instead of using estimateDispersions...).

Going forward, the matrix "mu" in assays(dds) will always be the MLE mu's.