MAI error - vector size limit
1
0
Entering edit mode
@hans-ulrich-klein-1945
Last seen 6 months ago
United States

Hi all,

I am receiving this error message when running MAI to impute missing values:

gp.mai <- MAI(gp.raw, MCAR_algorithm="BPCA", MNAR_algorithm="Single", assay_ix=1)
Estimating pattern of missingness
Imposing missingness
Generating features
Training
Error in randomForest.default(x, y, mtry = min(param$mtry, ncol(x)), ...) : 
  long vectors (argument 28) are not supported in .C

gp.raw is a SummarizedExperiment object with 9100x358 measurements. About 6% are missing values. Memory usage is substantial, and according to that message, the problem is that MAI exceeds the maximum vector length. Did anybody else run into this problem? I wonder whether there is an easy workaround, for example, using a subset of the data at the training step, but MAI() does not offer many options.

Best, Hans


sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS:   /mnt/mfs/cluster/bin/R-4.1.3/lib/libRblas.so
LAPACK: /mnt/mfs/cluster/bin/R-4.1.3/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] caret_6.0-92                lattice_0.20-45            
 [3] SummarizedExperiment_1.24.0 GenomicRanges_1.46.1       
 [5] GenomeInfoDb_1.30.1         IRanges_2.28.0             
 [7] S4Vectors_0.32.4            MatrixGenerics_1.6.0       
 [9] matrixStats_0.62.0          preprocessCore_1.56.0      
[11] MAI_1.0.0                   imputeLCMD_2.1             
[13] impute_1.68.0               pcaMethods_1.86.0          
[15] Biobase_2.54.0              BiocGenerics_0.40.0        
[17] norm_1.0-10.0               tmvtnorm_1.5               
[19] gmm_1.6-6                   sandwich_3.0-2             
[21] Matrix_1.4-0                mvtnorm_1.1-3              
[23] ggplot2_3.3.6              

loaded via a namespace (and not attached):
 [1] googledrive_2.0.0      colorspace_2.0-3       ellipsis_0.3.2        
 [4] class_7.3-20           XVector_0.34.0         fs_1.5.2              
 [7] proxy_0.4-27           listenv_0.8.0          prodlim_2019.11.13    
[10] fansi_1.0.3            lubridate_1.8.0        xml2_1.3.3            
[13] codetools_0.2-18       splines_4.1.3          doParallel_1.0.17     
[16] itertools_0.1-3        jsonlite_1.8.0         pROC_1.18.0           
[19] broom_1.0.0            dbplyr_2.2.1           missForest_1.5        
[22] readr_2.1.2            compiler_4.1.3         httr_1.4.3            
[25] backports_1.4.1        assertthat_0.2.1       gargle_1.2.0          
[28] cli_3.3.0              tools_4.1.3            gtable_0.3.0          
[31] glue_1.6.2             GenomeInfoDbData_1.2.7 reshape2_1.4.4        
[34] dplyr_1.0.9            doRNG_1.8.2            Rcpp_1.0.9            
[37] cellranger_1.1.0       vctrs_0.4.1            nlme_3.1-155          
[40] iterators_1.0.14       timeDate_4021.104      gower_1.0.0           
[43] stringr_1.4.0          globals_0.15.1         rvest_1.0.2           
[46] lifecycle_1.0.1        rngtools_1.5.2         googlesheets4_1.0.0   
[49] future_1.27.0          MASS_7.3-55            zlibbioc_1.40.0       
[52] zoo_1.8-10             scales_1.2.0           ipred_0.9-13          
[55] hms_1.1.1              parallel_4.1.3         tidyverse_1.3.2       
[58] rpart_4.1.16           stringi_1.7.8          randomForest_4.7-1.1  
[61] foreach_1.5.2          e1071_1.7-11           hardhat_1.2.0         
[64] lava_1.6.10            rlang_1.0.4            pkgconfig_2.0.3       
[67] bitops_1.0-7           purrr_0.3.4            recipes_1.0.1         
[70] tidyselect_1.1.2       parallelly_1.32.1      plyr_1.8.7            
[73] magrittr_2.0.3         R6_2.5.1               generics_0.1.3        
[76] DelayedArray_0.20.0    DBI_1.1.3              pillar_1.8.0          
[79] haven_2.5.0            withr_2.5.0            survival_3.2-13       
[82] RCurl_1.98-1.8         nnet_7.3-17            tibble_3.1.8          
[85] future.apply_1.9.0     modelr_0.1.8           utf8_1.2.2            
[88] tzdb_0.3.0             grid_4.1.3             readxl_1.4.0          
[91] data.table_1.14.2      forcats_0.5.1          ModelMetrics_1.2.2.2  
[94] reprex_2.0.1           digest_0.6.29          tidyr_1.2.0           
[97] munsell_0.5.0
randomForest MAI • 759 views
ADD COMMENT
0
Entering edit mode

Here is a reproducible example. I believe any larger dataset will crash:

library(MAI) 
values <- rnorm(8000*300)
values[sample(1:(8000*300), size=20000)] <- NA
dataMat <- matrix(values, nrow=8000, ncol=300)
imputed <- MAI(dataMat, MCAR_algorithm="BPCA", MNAR_algorithm="Single")

Consumes a larger amount of memory during training and then crashes with:

Estimating pattern of missingness
Imposing missingness
Generating features
Training
Error in randomForest.default(x, y, mtry = min(param$mtry, ncol(x)), ...) : 
  long vectors (argument 28) are not supported in .C
ADD REPLY
0
Entering edit mode
@67e42748
Last seen 19 months ago
United States

Thank you for the reproducible example. This is an error with R not allowing the random forest algorithm to exceed the set memory size. I was able to to get the example you provided me to work by decreasing the number of trees trained in the RF. I added a parameter forest_list_args so that you can pass any random forest parameter you want to the model. I pushed the changes to https://github.com/KechrisLab/MAI. I was unable to push to Bioconductor I got an error of ! [remote rejected] main -> main (hook declined) error: failed to push some refs to 'git.bioconductor.org:packages/MAI.git'. I will need to a couple of days to figure out what is happening there. In the mean time please install the package through GitHub.

Let me know if anything else comes up.

Best luck, Jonathan.

ADD COMMENT

Login before adding your answer.

Traffic: 826 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6