Question: DRIMSeq continious variable in design matrix - Error during precision estimation
0
7 months ago by
fiona.dick9110
fiona.dick9110 wrote:

Hi,

I tried to use DRIMSeq to test for DTU given a design matrix that looks like this: (this is just an example, I have (49 samples)

 [1] "DESIGN: Formula:"
~condition + cov1 + cov2
[1] "Design matrix:"
(Intercept) condition cov1 cov2
1            1        86 6.4        0
2            1        85 6.6        1
3            1        84 8.7        0
4            1        84 6.4        0
5            1        84 7.2        1
6            1        76 7.4        0
7            1        80 6.8        1
8            1        89 5.4        1
9            1        81 7.2        0
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")\$cohort
[1] "contr.treatment"


When applying the function DRIMSeq::dmPrecision like so :

 #printed design_full above
design_full <- model.matrix(designFormula,data=DRIMSeq::samples(d))
d <- dmPrecision(d,design=design_full)


I get the following error:

! Using a subset of 0.1 genes to estimate common precision !

Error in optimHess(par = par, fn = dm_lik_regG, gr = dm_score_regG, x = x,  :
non-finite value supplied by optim


I wanted ask what exactly this could be due to. If I exclude the continuous variable from the design matrix I dont end up in this error. Id be happy for any suggestions.

Fiona

> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.0
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
[1] LC_CTYPE=en_DK.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_DK.UTF-8        LC_COLLATE=en_DK.UTF-8
[5] LC_MONETARY=en_DK.UTF-8    LC_MESSAGES=en_DK.UTF-8
[7] LC_PAPER=en_DK.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] DRIMSeq_1.10.1 nvimcom_0.9-58

loaded via a namespace (and not attached):
[1] Rcpp_0.12.18           compiler_3.5.2         pillar_1.3.1
[4] GenomeInfoDb_1.18.1    plyr_1.8.4             XVector_0.21.3
[7] bindr_0.1.1            bitops_1.0-6           tools_3.5.2
[10] zlibbioc_1.28.0        tibble_1.4.2           gtable_0.2.0
[13] lattice_0.20-38        pkgconfig_2.0.2        rlang_0.3.0.1
[16] parallel_3.5.2         bindrcpp_0.2.2         GenomeInfoDbData_1.2.0
[19] stringr_1.3.1          dplyr_0.7.7            S4Vectors_0.19.19
[22] IRanges_2.15.16        locfit_1.5-9.1         stats4_3.5.2
[25] grid_3.5.2             tidyselect_0.2.5       glue_1.3.0
[28] R6_2.3.0               BiocParallel_1.15.8    limma_3.37.4
[31] reshape2_1.4.3         purrr_0.2.5            ggplot2_3.1.0
[34] edgeR_3.23.5           magrittr_1.5           scales_1.0.0
[37] BiocGenerics_0.28.0    GenomicRanges_1.32.4   assertthat_0.2.0
[40] colorspace_1.4-0       stringi_1.2.4          RCurl_1.95-4.11
[43] lazyeval_0.2.1         munsell_0.5.0          crayon_1.3.4

drimseq dmprecision • 145 views
modified 7 months ago • written 7 months ago by fiona.dick9110

Hello Fiona,

there are two different computational strategies used depending if your design is simple (multiple groups) or more complex (continuous covariates). In the fist case, DM parameters are estimated per group. In the second case, a regression approach with Hessians is used. It somehow breaks for the second case. If your data is not sensitive, would like to share it with me via email or dropbox so I could have a look into it. I think 10 samples would be enough.

All the best,

Gosia