I'm getting a rather cryptic error from the ENmix
package's bmiq.mc()
function:
Error in ENmix::bmiq.mc(forDNAm_preprocess2, nCores = 2) : BMIQ estimates encountered error, try to run it again
I've narrowed the problem down over the past couple of weeks and getting closer to the issue at hand.
It appears that running the following on only buccal samples works. However, blood samples cause ENmix::bmiq.mc()
to fail with the above error.
library(minfi)
library(ENmix)
# load rgSet
rgSet_1 <- read_rds("file1.rds")
# get beta values matrix
betas_1 = minfi::getBeta(rgSet_1)
# check for NA's
sumis.na(betas_1)) # 2643
# preprocess rgSet
forDNAm_preprocess1 = minfi::preprocessNoob(rgSet_1)
# check for NA betas after preprocessNoob
preprocessBetas_1 = getBeta(forDNAm_preprocess1)
sumis.na(preprocessBetas_1)) # 0 NA's
# BMIQ normalize preprocessed rgSet
forDNAm_1 = ENmix::bmiq.mc(forDNAm_preprocess1, nCores = 2)
I was trying to root out which (or all) blood samples are causing the issue, therefore I wanted to run the samples one-by-one (this should work in theory as BMIQ
normalizes type-2 probe bias for each sample separately and independently from all other samples). Unfortunately the minfi::preprocessNoob(rgSet)
step, which precedes it, requires more than one sample or it gives the error:
Error in array(STATS, dims[perm]) : 'dims' cannot be of length 0
Thus, I tried the following approach. First, I attempted with only two samples:
2176_Blood B1 WG5121380 2176_Blood 200923040002 R02C01
2005_Blood C1 WG5121380 2005_Blood 200923040002 R03C01
ENmix::bmiq.mc()
ran with no problems. So then I tried three samples:
2176_Blood B1 WG5121380 2176_Blood 200923040002 R02C01
2005_Blood C1 WG5121380 2005_Blood 200923040002 R03C01
2121_Blood E2 WG5121380 2121_Blood 200923040004 R05C01
This failed with the error. Okay, so sample 2121_Blood
is "bad" right? Not so fast, If I run the following it works:
2005_Blood C1 WG5121380 2005_Blood 200923040002 R03C01
2121_Blood E2 WG5121380 2121_Blood 200923040004 R05C01
OR
2176_Blood B1 WG5121380 2176_Blood 200923040002 R02C01
2121_Blood E2 WG5121380 2121_Blood 200923040004 R05C01
I've tried various combinations and it seems to be that adding 3+
(blood) samples causes BMIQ
to fail.
The author of ENmix
did a wrapper around the BMIQ
method from the wateRmelon
package. I contacted Leonard Schalkwyk, author of wateRmelon
but he said that it's actual a wrapper from the actual BMIQ
method written by Andrew Teschendorff.
Leonard told me:
"He wrote that method to deal with some quite nontypical data and I wouldn't recommend it for general use. The mixture model is fit by an iterative process which is quite computationally heavy. In the original paper by Andrew the number of iterations used wouldn't be practical for a data set of any size. I can't say the default parameters in the wateRmelon version are particularly clever or guaranteed to be appropriate and the number of iterations may not be enough for it to converge in your case. That would probably be the first place to look -- if you have a reason to use
BMIQ
. Otherwise you're probably better off with any straightforward quantile normaliser such asdasen
."
I had also reached out to Andrew
"The optimization can sometimes fail to converge, which could cause the wrapper function which runs on all samples using parallel to fail if one sample fails to converge. It very rarely happens."
I suppose the reason for using BMIQ
is because we've used this script for other cohorts and want to keep it the same for consistencies sake when we try to publish. Not sure whether I should try to optimize the number of iterations or resort to using a different normaliser/preprocessing step altogether and try explaining this to the reviewers?
> sessionInfo()
R version 3.5.1 (2018-07-02) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)
Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale: [1] LCCTYPE=enUS.UTF-8 LCNUMERIC=C LCTIME=enUS.UTF-8
[4] LCCOLLATE=enUS.UTF-8 LCMONETARY=enUS.UTF-8 LCMESSAGES=enUS.UTF-8
[7] LCPAPER=enUS.UTF-8 LCNAME=C LCADDRESS=C
[10] LCTELEPHONE=C LCMEASUREMENT=enUS.UTF-8 LC_IDENTIFICATION=Cattached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods
[9] baseother attached packages: [1] IlluminaHumanMethylationEPICanno.ilm10b4.hg190.6.0 [2] IlluminaHumanMethylationEPICmanifest0.3.0
[3] bindrcpp0.2.2
[4] sva3.30.1
[5] genefilter1.64.0
[6] mgcv1.8-27
[7] nlme3.1-137
[8] limma3.38.3
[9] ggpubr0.2
[10] lubridate1.7.4
[11] rlist0.4.6.1
[12] magrittr1.5
[13] readxl1.2.0
[14] data.table1.12.0
[15] minfi1.28.3
[16] bumphunter1.24.5
[17] locfit1.5-9.1
[18] iterators1.0.10
[19] foreach1.4.4
[20] Biostrings2.50.2
[21] XVector0.22.0
[22] SummarizedExperiment1.12.0
[23] DelayedArray0.8.0
[24] BiocParallel1.16.5
[25] matrixStats0.54.0
[26] Biobase2.42.0
[27] GenomicRanges1.34.0
[28] GenomeInfoDb1.18.1
[29] IRanges2.16.0
[30] S4Vectors0.20.1
[31] BiocGenerics0.28.0
[32] forcats0.3.0
[33] stringr1.3.1
[34] dplyr0.7.8
[35] purrr0.3.0
[36] readr1.3.1
[37] tidyr0.8.2
[38] tibble2.0.1
[39] ggplot23.1.0
[40] tidyverse1.2.1loaded via a namespace (and not attached): [1] backports1.1.3
[2] plyr1.8.4
[3] lazyeval0.2.1
[4] splines3.5.1
[5] digest0.6.18
[6] memoise1.1.0.9000
[7] doParallel1.0.14
[8] annotate1.60.0
[9] modelr0.1.3
[10] askpass1.1
[11] siggenes1.56.0
[12] prettyunits1.0.2
[13] colorspace1.4-0
[14] blob1.1.1
[15] rvest0.3.2
[16] haven2.0.0
[17] crayon1.3.4
[18] RCurl1.95-4.11
[19] jsonlite1.6
[20] impute1.56.0
[21] bindr0.1.1
[22] GEOquery2.50.5
[23] survival2.43-3
[24] glue1.3.0
[25] registry0.5
[26] gtable0.2.0
[27] lumi2.34.0
[28] zlibbioc1.28.0
[29] Rhdf5lib1.4.2
[30] HDF5Array1.10.1
[31] scales1.0.0
[32] DBI1.0.0
[33] rngtools1.3.1
[34] bibtex0.4.2
[35] highlite0.0.0.9000
[36] Rcpp1.0.0
[37] xtable1.8-3
[38] progress1.2.0
[39] bit1.1-14
[40] mclust5.4.2
[41] preprocessCore1.44.0
[42] ENmix1.18.1
[43] httr1.4.0
[44] RColorBrewer1.1-2
[45] pkgconfig2.0.2
[46] reshape0.8.8
[47] XML3.98-1.16
[48] tidyselect0.2.5
[49] rlang0.3.1
[50] AnnotationDbi1.44.0
[51] munsell0.5.0
[52] cellranger1.1.0
[53] tools3.5.1
[54] cli1.0.1
[55] generics0.0.2
[56] RSQLite2.1.1
[57] broom0.5.1
[58] yaml2.2.0
[59] bit640.9-7
[60] beanplot1.2
[61] methylumi2.28.0
[62] ROC1.58.0
[63] doRNG1.7.1
[64] nor1mix1.2-3
[65] wateRmelon1.26.0
[66] xml21.2.0
[67] biomaRt2.38.0
[68] lookup0.0.0.9000
[69] compiler3.5.1
[70] rstudioapi0.9.0
[71] affyio1.52.0
[72] geneplotter1.60.0
[73] stringi1.2.4
[74] IlluminaHumanMethylation450kanno.ilmn12.hg190.6.0 [75] GenomicFeatures1.34.3
[76] lattice0.20-38
[77] Matrix1.2-15
[78] multtest2.38.0
[79] pillar1.3.1
[80] BiocManager1.30.4
[81] bitops1.0-6
[82] rtracklayer1.42.1
[83] affy1.60.0
[84] R62.3.0
[85] KernSmooth2.23-15
[86] nleqslv3.3.2
[87] codetools0.2-16
[88] MASS7.3-51.1
[89] assertthat0.2.0
[90] rhdf52.26.2
[91] openssl1.2.1
[92] pkgmaker0.27
[93] withr2.1.2
[94] GenomicAlignments1.18.1
[95] Rsamtools1.34.1
[96] GenomeInfoDbData1.2.0
[97] hms0.4.2
[98] quadprog1.5-5
[99] grid3.5.1
[100] base642.0
[101] DelayedMatrixStats1.4.0
[102] illuminaio0.24.0