I'm getting a rather cryptic error from the
Error in ENmix::bmiq.mc(forDNAm_preprocess2, nCores = 2) : BMIQ estimates encountered error, try to run it again
I've narrowed the problem down over the past couple of weeks and getting closer to the issue at hand.
It appears that running the following on only buccal samples works. However, blood samples cause
ENmix::bmiq.mc() to fail with the above error.
library(minfi) library(ENmix) # load rgSet rgSet_1 <- read_rds("file1.rds") # get beta values matrix betas_1 = minfi::getBeta(rgSet_1) # check for NA's sumis.na(betas_1)) # 2643 # preprocess rgSet forDNAm_preprocess1 = minfi::preprocessNoob(rgSet_1) # check for NA betas after preprocessNoob preprocessBetas_1 = getBeta(forDNAm_preprocess1) sumis.na(preprocessBetas_1)) # 0 NA's # BMIQ normalize preprocessed rgSet forDNAm_1 = ENmix::bmiq.mc(forDNAm_preprocess1, nCores = 2)
I was trying to root out which (or all) blood samples are causing the issue, therefore I wanted to run the samples one-by-one (this should work in theory as
BMIQ normalizes type-2 probe bias for each sample separately and independently from all other samples). Unfortunately the
minfi::preprocessNoob(rgSet) step, which precedes it, requires more than one sample or it gives the error:
Error in array(STATS, dims[perm]) : 'dims' cannot be of length 0
Thus, I tried the following approach. First, I attempted with only two samples:
2176_Blood B1 WG5121380 2176_Blood 200923040002 R02C01 2005_Blood C1 WG5121380 2005_Blood 200923040002 R03C01
ENmix::bmiq.mc() ran with no problems. So then I tried three samples:
2176_Blood B1 WG5121380 2176_Blood 200923040002 R02C01 2005_Blood C1 WG5121380 2005_Blood 200923040002 R03C01 2121_Blood E2 WG5121380 2121_Blood 200923040004 R05C01
This failed with the error. Okay, so sample
2121_Blood is "bad" right? Not so fast, If I run the following it works:
2005_Blood C1 WG5121380 2005_Blood 200923040002 R03C01 2121_Blood E2 WG5121380 2121_Blood 200923040004 R05C01 OR 2176_Blood B1 WG5121380 2176_Blood 200923040002 R02C01 2121_Blood E2 WG5121380 2121_Blood 200923040004 R05C01
I've tried various combinations and it seems to be that adding
3+ (blood) samples causes
BMIQ to fail.
The author of
ENmix did a wrapper around the
BMIQ method from the
wateRmelon package. I contacted Leonard Schalkwyk, author of
wateRmelon but he said that it's actual a wrapper from the actual
BMIQ method written by Andrew Teschendorff.
Leonard told me:
"He wrote that method to deal with some quite nontypical data and I wouldn't recommend it for general use. The mixture model is fit by an iterative process which is quite computationally heavy. In the original paper by Andrew the number of iterations used wouldn't be practical for a data set of any size. I can't say the default parameters in the wateRmelon version are particularly clever or guaranteed to be appropriate and the number of iterations may not be enough for it to converge in your case. That would probably be the first place to look -- if you have a reason to use
BMIQ. Otherwise you're probably better off with any straightforward quantile normaliser such as
I had also reached out to Andrew
"The optimization can sometimes fail to converge, which could cause the wrapper function which runs on all samples using parallel to fail if one sample fails to converge. It very rarely happens."
I suppose the reason for using
BMIQ is because we've used this script for other cohorts and want to keep it the same for consistencies sake when we try to publish. Not sure whether I should try to optimize the number of iterations or resort to using a different normaliser/preprocessing step altogether and try explaining this to the reviewers?
R version 3.5.1 (2018-07-02) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)
Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:  LCCTYPE=enUS.UTF-8 LCNUMERIC=C LCTIME=enUS.UTF-8
 LCCOLLATE=enUS.UTF-8 LCMONETARY=enUS.UTF-8 LCMESSAGES=enUS.UTF-8
 LCPAPER=enUS.UTF-8 LCNAME=C LCADDRESS=C
 LCTELEPHONE=C LCMEASUREMENT=enUS.UTF-8 LC_IDENTIFICATION=C
attached base packages:  stats4 parallel stats graphics grDevices utils datasets methods
other attached packages:  IlluminaHumanMethylationEPICanno.ilm10b4.hg190.6.0  IlluminaHumanMethylationEPICmanifest0.3.0
loaded via a namespace (and not attached):  backports1.1.3
 IlluminaHumanMethylation450kanno.ilmn12.hg190.6.0  GenomicFeatures1.34.3