Question

Error in ENmix::bmiq.mc(): BMIQ estimates encountered error, try to run it again

0

Entering edit mode

moldach ▴ 20

@moldach-8829

Last seen 3.9 years ago

Canada/Montreal/Douglas Mental Health I…

I'm getting a rather cryptic error from the ENmix package's bmiq.mc() function:

Error in ENmix::bmiq.mc(forDNAm_preprocess2, nCores = 2) : BMIQ estimates encountered error, try to run it again

I've narrowed the problem down over the past couple of weeks and getting closer to the issue at hand.

It appears that running the following on only buccal samples works. However, blood samples cause ENmix::bmiq.mc() to fail with the above error.

library(minfi)
library(ENmix)

# load rgSet
rgSet_1 <- read_rds("file1.rds")

# get beta values matrix
betas_1 = minfi::getBeta(rgSet_1)

# check for NA's
sumis.na(betas_1))  # 2643

# preprocess rgSet
forDNAm_preprocess1 = minfi::preprocessNoob(rgSet_1)

# check for NA betas after preprocessNoob
preprocessBetas_1 = getBeta(forDNAm_preprocess1)
sumis.na(preprocessBetas_1)) # 0 NA's

# BMIQ normalize preprocessed rgSet
forDNAm_1 = ENmix::bmiq.mc(forDNAm_preprocess1, nCores = 2)

I was trying to root out which (or all) blood samples are causing the issue, therefore I wanted to run the samples one-by-one (this should work in theory as BMIQ normalizes type-2 probe bias for each sample separately and independently from all other samples). Unfortunately the minfi::preprocessNoob(rgSet) step, which precedes it, requires more than one sample or it gives the error:

Error in array(STATS, dims[perm]) : 'dims' cannot be of length 0

Thus, I tried the following approach. First, I attempted with only two samples:

2176_Blood  B1  WG5121380   2176_Blood      200923040002    R02C01
2005_Blood  C1  WG5121380   2005_Blood      200923040002    R03C01

ENmix::bmiq.mc() ran with no problems. So then I tried three samples:

2176_Blood  B1  WG5121380   2176_Blood      200923040002    R02C01
2005_Blood  C1  WG5121380   2005_Blood      200923040002    R03C01
2121_Blood  E2  WG5121380   2121_Blood      200923040004    R05C01

This failed with the error. Okay, so sample 2121_Blood is "bad" right? Not so fast, If I run the following it works:

2005_Blood  C1  WG5121380   2005_Blood      200923040002    R03C01
2121_Blood  E2  WG5121380   2121_Blood      200923040004    R05C01

OR

2176_Blood  B1  WG5121380   2176_Blood      200923040002    R02C01
2121_Blood  E2  WG5121380   2121_Blood      200923040004    R05C01

I've tried various combinations and it seems to be that adding 3+ (blood) samples causes BMIQ to fail.

The author of ENmix did a wrapper around the BMIQ method from the wateRmelon package. I contacted Leonard Schalkwyk, author of wateRmelon but he said that it's actual a wrapper from the actual BMIQ method written by Andrew Teschendorff.

Leonard told me:

"He wrote that method to deal with some quite nontypical data and I wouldn't recommend it for general use. The mixture model is fit by an iterative process which is quite computationally heavy. In the original paper by Andrew the number of iterations used wouldn't be practical for a data set of any size. I can't say the default parameters in the wateRmelon version are particularly clever or guaranteed to be appropriate and the number of iterations may not be enough for it to converge in your case. That would probably be the first place to look -- if you have a reason to use BMIQ. Otherwise you're probably better off with any straightforward quantile normaliser such as dasen."

I had also reached out to Andrew

"The optimization can sometimes fail to converge, which could cause the wrapper function which runs on all samples using parallel to fail if one sample fails to converge. It very rarely happens."

I suppose the reason for using BMIQ is because we've used this script for other cohorts and want to keep it the same for consistencies sake when we try to publish. Not sure whether I should try to optimize the number of iterations or resort to using a different normaliser/preprocessing step altogether and try explaining this to the reviewers?

> sessionInfo()

R version 3.5.1 (2018-07-02) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale: [1] LCCTYPE=enUS.UTF-8 LCNUMERIC=C LCTIME=enUS.UTF-8
[4] LCCOLLATE=enUS.UTF-8 LCMONETARY=enUS.UTF-8 LCMESSAGES=enUS.UTF-8
[7] LCPAPER=enUS.UTF-8 LCNAME=C LCADDRESS=C
[10] LCTELEPHONE=C LCMEASUREMENT=enUS.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods
[9] base

other attached packages: [1] IlluminaHumanMethylationEPICanno.ilm10b4.hg190.6.0 [2] IlluminaHumanMethylationEPICmanifest0.3.0
[3] bindrcpp0.2.2
[4] sva3.30.1
[5] genefilter1.64.0
[6] mgcv1.8-27
[7] nlme3.1-137
[8] limma3.38.3
[9] ggpubr0.2
[10] lubridate1.7.4
[11] rlist0.4.6.1
[12] magrittr1.5
[13] readxl1.2.0
[14] data.table1.12.0
[15] minfi1.28.3
[16] bumphunter1.24.5
[17] locfit1.5-9.1
[18] iterators1.0.10
[19] foreach1.4.4
[20] Biostrings2.50.2
[21] XVector0.22.0
[22] SummarizedExperiment1.12.0
[23] DelayedArray0.8.0
[24] BiocParallel1.16.5
[25] matrixStats0.54.0
[26] Biobase2.42.0
[27] GenomicRanges1.34.0
[28] GenomeInfoDb1.18.1
[29] IRanges2.16.0
[30] S4Vectors0.20.1
[31] BiocGenerics0.28.0
[32] forcats0.3.0
[33] stringr1.3.1
[34] dplyr0.7.8
[35] purrr0.3.0
[36] readr1.3.1
[37] tidyr0.8.2
[38] tibble2.0.1
[39] ggplot23.1.0
[40] tidyverse1.2.1

loaded via a namespace (and not attached): [1] backports1.1.3
[2] plyr1.8.4
[3] lazyeval0.2.1
[4] splines3.5.1
[5] digest0.6.18
[6] memoise1.1.0.9000
[7] doParallel1.0.14
[8] annotate1.60.0
[9] modelr0.1.3
[10] askpass1.1
[11] siggenes1.56.0
[12] prettyunits1.0.2
[13] colorspace1.4-0
[14] blob1.1.1
[15] rvest0.3.2
[16] haven2.0.0
[17] crayon1.3.4
[18] RCurl1.95-4.11
[19] jsonlite1.6
[20] impute1.56.0
[21] bindr0.1.1
[22] GEOquery2.50.5
[23] survival2.43-3
[24] glue1.3.0
[25] registry0.5
[26] gtable0.2.0
[27] lumi2.34.0
[28] zlibbioc1.28.0
[29] Rhdf5lib1.4.2
[30] HDF5Array1.10.1
[31] scales1.0.0
[32] DBI1.0.0
[33] rngtools1.3.1
[34] bibtex0.4.2
[35] highlite0.0.0.9000
[36] Rcpp1.0.0
[37] xtable1.8-3
[38] progress1.2.0
[39] bit1.1-14
[40] mclust5.4.2
[41] preprocessCore1.44.0
[42] ENmix1.18.1
[43] httr1.4.0
[44] RColorBrewer1.1-2
[45] pkgconfig2.0.2
[46] reshape0.8.8
[47] XML3.98-1.16
[48] tidyselect0.2.5
[49] rlang0.3.1
[50] AnnotationDbi1.44.0
[51] munsell0.5.0
[52] cellranger1.1.0
[53] tools3.5.1
[54] cli1.0.1
[55] generics0.0.2
[56] RSQLite2.1.1
[57] broom0.5.1
[58] yaml2.2.0
[59] bit640.9-7
[60] beanplot1.2
[61] methylumi2.28.0
[62] ROC1.58.0
[63] doRNG1.7.1
[64] nor1mix1.2-3
[65] wateRmelon1.26.0
[66] xml21.2.0
[67] biomaRt2.38.0
[68] lookup0.0.0.9000
[69] compiler3.5.1
[70] rstudioapi0.9.0
[71] affyio1.52.0
[72] geneplotter1.60.0
[73] stringi1.2.4
[74] IlluminaHumanMethylation450kanno.ilmn12.hg190.6.0 [75] GenomicFeatures1.34.3
[76] lattice0.20-38
[77] Matrix1.2-15
[78] multtest2.38.0
[79] pillar1.3.1
[80] BiocManager1.30.4
[81] bitops1.0-6
[82] rtracklayer1.42.1
[83] affy1.60.0
[84] R62.3.0
[85] KernSmooth2.23-15
[86] nleqslv3.3.2
[87] codetools0.2-16
[88] MASS7.3-51.1
[89] assertthat0.2.0
[90] rhdf52.26.2
[91] openssl1.2.1
[92] pkgmaker0.27
[93] withr2.1.2
[94] GenomicAlignments1.18.1
[95] Rsamtools1.34.1
[96] GenomeInfoDbData1.2.0
[97] hms0.4.2
[98] quadprog1.5-5
[99] grid3.5.1
[100] base642.0
[101] DelayedMatrixStats1.4.0
[102] illuminaio0.24.0

methylation ENmix BMIQ wateRmelon • 1.1k views

ADD COMMENT • link 5.2 years ago moldach ▴ 20