MAI error - vector size limit
Entering edit mode
Last seen 12 weeks ago
United States

Hi all,

I am receiving this error message when running MAI to impute missing values:

gp.mai <- MAI(gp.raw, MCAR_algorithm="BPCA", MNAR_algorithm="Single", assay_ix=1)
Estimating pattern of missingness
Imposing missingness
Generating features
Error in randomForest.default(x, y, mtry = min(param$mtry, ncol(x)), ...) : 
  long vectors (argument 28) are not supported in .C

gp.raw is a SummarizedExperiment object with 9100x358 measurements. About 6% are missing values. Memory usage is substantial, and according to that message, the problem is that MAI exceeds the maximum vector length. Did anybody else run into this problem? I wonder whether there is an easy workaround, for example, using a subset of the data at the training step, but MAI() does not offer many options.

Best, Hans

R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS:   /mnt/mfs/cluster/bin/R-4.1.3/lib/
LAPACK: /mnt/mfs/cluster/bin/R-4.1.3/lib/

 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] caret_6.0-92                lattice_0.20-45            
 [3] SummarizedExperiment_1.24.0 GenomicRanges_1.46.1       
 [5] GenomeInfoDb_1.30.1         IRanges_2.28.0             
 [7] S4Vectors_0.32.4            MatrixGenerics_1.6.0       
 [9] matrixStats_0.62.0          preprocessCore_1.56.0      
[11] MAI_1.0.0                   imputeLCMD_2.1             
[13] impute_1.68.0               pcaMethods_1.86.0          
[15] Biobase_2.54.0              BiocGenerics_0.40.0        
[17] norm_1.0-10.0               tmvtnorm_1.5               
[19] gmm_1.6-6                   sandwich_3.0-2             
[21] Matrix_1.4-0                mvtnorm_1.1-3              
[23] ggplot2_3.3.6              

loaded via a namespace (and not attached):
 [1] googledrive_2.0.0      colorspace_2.0-3       ellipsis_0.3.2        
 [4] class_7.3-20           XVector_0.34.0         fs_1.5.2              
 [7] proxy_0.4-27           listenv_0.8.0          prodlim_2019.11.13    
[10] fansi_1.0.3            lubridate_1.8.0        xml2_1.3.3            
[13] codetools_0.2-18       splines_4.1.3          doParallel_1.0.17     
[16] itertools_0.1-3        jsonlite_1.8.0         pROC_1.18.0           
[19] broom_1.0.0            dbplyr_2.2.1           missForest_1.5        
[22] readr_2.1.2            compiler_4.1.3         httr_1.4.3            
[25] backports_1.4.1        assertthat_0.2.1       gargle_1.2.0          
[28] cli_3.3.0              tools_4.1.3            gtable_0.3.0          
[31] glue_1.6.2             GenomeInfoDbData_1.2.7 reshape2_1.4.4        
[34] dplyr_1.0.9            doRNG_1.8.2            Rcpp_1.0.9            
[37] cellranger_1.1.0       vctrs_0.4.1            nlme_3.1-155          
[40] iterators_1.0.14       timeDate_4021.104      gower_1.0.0           
[43] stringr_1.4.0          globals_0.15.1         rvest_1.0.2           
[46] lifecycle_1.0.1        rngtools_1.5.2         googlesheets4_1.0.0   
[49] future_1.27.0          MASS_7.3-55            zlibbioc_1.40.0       
[52] zoo_1.8-10             scales_1.2.0           ipred_0.9-13          
[55] hms_1.1.1              parallel_4.1.3         tidyverse_1.3.2       
[58] rpart_4.1.16           stringi_1.7.8          randomForest_4.7-1.1  
[61] foreach_1.5.2          e1071_1.7-11           hardhat_1.2.0         
[64] lava_1.6.10            rlang_1.0.4            pkgconfig_2.0.3       
[67] bitops_1.0-7           purrr_0.3.4            recipes_1.0.1         
[70] tidyselect_1.1.2       parallelly_1.32.1      plyr_1.8.7            
[73] magrittr_2.0.3         R6_2.5.1               generics_0.1.3        
[76] DelayedArray_0.20.0    DBI_1.1.3              pillar_1.8.0          
[79] haven_2.5.0            withr_2.5.0            survival_3.2-13       
[82] RCurl_1.98-1.8         nnet_7.3-17            tibble_3.1.8          
[85] future.apply_1.9.0     modelr_0.1.8           utf8_1.2.2            
[88] tzdb_0.3.0             grid_4.1.3             readxl_1.4.0          
[91] data.table_1.14.2      forcats_0.5.1          ModelMetrics_1.2.2.2  
[94] reprex_2.0.1           digest_0.6.29          tidyr_1.2.0           
[97] munsell_0.5.0
randomForest MAI • 641 views
Entering edit mode

Here is a reproducible example. I believe any larger dataset will crash:

values <- rnorm(8000*300)
values[sample(1:(8000*300), size=20000)] <- NA
dataMat <- matrix(values, nrow=8000, ncol=300)
imputed <- MAI(dataMat, MCAR_algorithm="BPCA", MNAR_algorithm="Single")

Consumes a larger amount of memory during training and then crashes with:

Estimating pattern of missingness
Imposing missingness
Generating features
Error in randomForest.default(x, y, mtry = min(param$mtry, ncol(x)), ...) : 
  long vectors (argument 28) are not supported in .C
Entering edit mode
Last seen 15 months ago
United States

Thank you for the reproducible example. This is an error with R not allowing the random forest algorithm to exceed the set memory size. I was able to to get the example you provided me to work by decreasing the number of trees trained in the RF. I added a parameter forest_list_args so that you can pass any random forest parameter you want to the model. I pushed the changes to I was unable to push to Bioconductor I got an error of ! [remote rejected] main -> main (hook declined) error: failed to push some refs to ''. I will need to a couple of days to figure out what is happening there. In the mean time please install the package through GitHub.

Let me know if anything else comes up.

Best luck, Jonathan.


Login before adding your answer.

Traffic: 629 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6