Question: DESeq fitted values using coefficients vs mu
3
gravatar for Matthew McCall
2.3 years ago by
United States
Matthew McCall830 wrote:

I've tried calculating fitted values from the output of DESeq in two ways that I thought should produce identical results; however, this doesn't seem to be the case. 

obj <- DESeq(obj)
q_ij <- t( t( assays(obj)[["mu"]] ) / sizeFactors(obj) )
mod <- attr(obj, "modelMatrix")
coefs <- coef(obj)
q_tst <- coefs %*% t(mod)
summary(as.vector(q_tst-log2(q_ij)))

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-6.8360  0.0000  0.0000 -0.1278  0.0000  0.0000 

plot(x=(log2(q_ij)+q_tst)/2, y=log2(q_ij)-q_tst)

I've also tried this with betaPrior=FALSE and minReplicatesForReplace=Inf and the two fitted value calculations were still not equal. Am I missing something obvious here? In case it's helpful I've uploaded the obj variable from my example code here: https://dl.dropboxusercontent.com/u/384142/deseq2-test-object.rda

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RColorBrewer_1.1-2         gplots_3.0.1               DESeq2_1.14.1              SummarizedExperiment_1.4.0 Biobase_2.34.0            
 [6] GenomicRanges_1.26.1       GenomeInfoDb_1.10.1        IRanges_2.8.1              S4Vectors_0.12.1           BiocGenerics_0.20.0       
[11] sva_3.22.0                 genefilter_1.56.0          mgcv_1.8-16                nlme_3.1-128              

loaded via a namespace (and not attached):
 [1] gtools_3.5.0         locfit_1.5-9.1       splines_3.3.2        lattice_0.20-34      colorspace_1.3-1     htmltools_0.3.5      survival_2.40-1     
 [8] XML_3.98-1.5         foreign_0.8-67       DBI_0.5-1            BiocParallel_1.8.1   plyr_1.8.4           stringr_1.1.0        zlibbioc_1.20.0     
[15] munsell_0.4.3        gtable_0.2.0         caTools_1.17.1       memoise_1.0.0        latticeExtra_0.6-28  knitr_1.15.1         geneplotter_1.52.0  
[22] AnnotationDbi_1.36.0 htmlTable_1.7        Rcpp_0.12.8          KernSmooth_2.23-15   acepack_1.4.1        xtable_1.8-2         openssl_0.9.5       
[29] scales_0.4.1         gdata_2.17.0         base64_2.0           Hmisc_4.0-1          annotate_1.52.0      XVector_0.14.0       gridExtra_2.2.1     
[36] ggplot2_2.2.0        digest_0.6.10        stringi_1.1.2        grid_3.3.2           tools_3.3.2          bitops_1.0-6         magrittr_1.5        
[43] RCurl_1.95-4.8       lazyeval_0.2.0       RSQLite_1.1-1        tibble_1.2           Formula_1.2-1        cluster_2.0.5        Matrix_1.2-7.1      
[50] data.table_1.10.0    assertthat_0.1       rpart_4.1-10         nnet_7.3-12     

 

 

 

deseq2 • 578 views
ADD COMMENTlink modified 2.3 years ago by Michael Love23k • written 2.3 years ago by Matthew McCall830
Answer: DESeq fitted values using coefficients vs mu
4
gravatar for Michael Love
2.3 years ago by
Michael Love23k
United States
Michael Love23k wrote:

hi Matthew,

If you use betaPrior=FALSE, they should be very close. I get 2e-15 with some simulated data:

library(DESeq2)
set.seed(1)
dds <- makeExampleDESeqDataSet()
dds <- dds[rowSums(counts(dds)) > 0,]
dds <- DESeq(dds, betaPrior=FALSE)
log2q1 <- log2(t(t(assays(dds)[["mu"]])/sizeFactors(dds)))
log2q2 <- coef(dds) %*% t(attr(dds, "modelMatrix"))
dev <- log2q1[,1] - log2q2[,1]
summary(abs(dev))

     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.000e+00 0.000e+00 0.000e+00 8.184e-17 0.000e+00 1.776e-15 

> packageVersion("DESeq2")
[1] ‘1.14.0’

I see equal deviations from zero per gene (for a gene, all samples have the same deviation if it's nonzero) which I think corresponds to the intercept changing by small amounts. I think what is going on in terms of the difference is the conversion of coefficients from log scale to log2 scale which happens internally.

ADD COMMENTlink written 2.3 years ago by Michael Love23k
1

Thank you for the fast reply. So with betaPrior=FALSE, I still get some sizable differences.

obj <- DESeq(obj, betaPrior=FALSE)
log2q1 <- log2(t( t( assays(obj)[["mu"]] ) / sizeFactors(obj) ))
log2q2 <- coef(obj) %*% t(attr(obj, "modelMatrix"))
summary(as.vector(log2q1-log2q2))

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
0.0000  0.0000  0.0000  0.5133  0.0000 34.5000 

Now with betaPrior=FALSE, some rows don't converge. If I remove those, there are only a few differences between the fitted value calculations, but they aren't decimal dust.

ind <- which(mcols(obj)$betaConv)
summary(as.vector(log2q1[ind,]-log2q2[ind,]))

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
0.000000 0.000000 0.000000 0.000895 0.000000 1.320000 

One thing I noticed is that the differences are only in rows that converged in beta but took the maximum iterations to do so:

which(log2q1[ind,]-log2q2[ind,] > 0.01, arr.ind=TRUE)

              row col
hsa-let-7b-3p   3 424
hsa-let-7b-3p   3 426
hsa-let-7b-3p   3 435
hsa-let-7a-3p   1 436
hsa-let-7b-3p   3 436
hsa-let-7b-3p   3 437
hsa-let-7b-3p   3 438

mcols(obj)[ind,326:327]

DataFrame with 7 rows and 2 columns
   betaConv  betaIter
  <logical> <numeric>
1      TRUE       100
2      TRUE         6
3      TRUE       100
4      TRUE        17
5      TRUE        21
6      TRUE         5
7      TRUE        15
ADD REPLYlink written 2.3 years ago by Matthew McCall830
1

Ah, I think that's because I'm not doing a final update for mu for those rows that undergo numeric optimization because they don't converge with the IRLS. So the better mu for those rows is the one you are creating by multiplying coefficients. I can fix this in devel.

ADD REPLYlink written 2.3 years ago by Michael Love23k
1

Ok, I've fixed this in v1.15.25. Now the differences should be within some typical numeric tolerance.

ADD REPLYlink written 2.3 years ago by Michael Love23k

Got it. Thanks!

ADD REPLYlink written 2.3 years ago by Matthew McCall830
1

One related question: is there a way to calculate the fitted values with betaPrior=TRUE using the coefficients and model matrix rather than assays(obj)[["mu"]] and the size factors?

ADD REPLYlink written 2.3 years ago by Matthew McCall830
1

If you want the "shrunken" mu's (so they will be closer to each other than the MLE mu's), you can just multiply the coef matrix by the model matrix and scale up by size factors. I took a look at the mu code just now and made it a bit more consistent (it was complicated and somewhat inconsistent by my having allowed users to run functions without necessarily having gone through expected previous steps, e.g. users providing external dispersion values instead of using estimateDispersions...).

Going forward, the matrix "mu" in assays(dds) will always be the MLE mu's.

ADD REPLYlink written 2.3 years ago by Michael Love23k

Great. Thank you!

ADD REPLYlink written 2.3 years ago by Matthew McCall830
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 259 users visited in the last hour