Dear all,
I am trying to build a bsse object with 15 genome wide cytosine report generated by bismark using read.bismark. I have use it before in similar scenarios with similar files and worked just fine. The only difference now is that my reports are bigger (around 5gb compressed). I have some of the following errors
'''System errno 22 unmapping file: Invalid argument System errno 22 unmapping file: Invalid argument Stop worker failed with the error: wrong args for environment subassignment Error: BiocParallel errors 0 remote errors, element index: 14 unevaluated and other errors first remote error:'''
or
'''Error in !bpok : invalid argument type'''
I have seen previous reports about lack of memory triggering !bpok errors, nevertheless my server has 504 GB and never in the process is close to use all the memory.
Trying to solve the problem, I have used the option to write to disk #BACKEND = "HDF5Array" and even with one file, the process hangs.
Thanks in advance,
Juan
library(foreach)
library(dmrseq)
library(BiocParallel)
library(HDF5Array)
register(MulticoreParam(10))
# Read metadata
metadata <- as.data.frame.array(data.table::fread("tissue_listSamples_dmrseq.csv", nThread = 10))
# Same order for metadata file and CpG_report file in file.list
#file.list <- metadata$file_name
#test_data<- as.data.frame(metadata$SampleID)
rownames(metadata) <- metadata$SampleID
cpg_txt.dir <- "/home/rstudio/AA/samples"
dir <- "/home/rstudio/AA/results"
file.list <- file.path(cpg_txt.dir, metadata$file_name)
bsseq.obj <- read.bismark(files = file.list[1],
rmZeroCov = FALSE,
strandCollapse = TRUE,
nThread = 2,
#BACKEND = "HDF5Array",
#dir = "dir",
#replace = FALSE,
verbose = TRUE)
# include your problematic code here with any corresponding output
# please also include the results of running the following in an R session
sessionInfo( )
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
[4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] HDF5Array_1.30.1 rhdf5_2.46.1 DelayedArray_0.28.0
[4] SparseArray_1.2.4 S4Arrays_1.2.1 abind_1.4-5
[7] Matrix_1.6-1.1 BiocParallel_1.36.0 dmrseq_1.22.1
[10] bsseq_1.38.0 SummarizedExperiment_1.32.0 Biobase_2.62.0
[13] MatrixGenerics_1.14.0 matrixStats_1.2.0 GenomicRanges_1.54.1
[16] GenomeInfoDb_1.38.8 IRanges_2.36.0 S4Vectors_0.40.2
[19] BiocGenerics_0.48.1 foreach_1.5.2
loaded via a namespace (and not attached):
[1] DBI_1.2.2 bitops_1.0-7
[3] biomaRt_2.58.2 permute_0.9-7
[5] rlang_1.1.3 magrittr_2.0.3
[7] compiler_4.3.2 RSQLite_2.3.5
[9] GenomicFeatures_1.54.4 DelayedMatrixStats_1.24.0
[11] reshape2_1.4.4 png_0.1-8
[13] vctrs_0.6.5 stringr_1.5.1
[15] pkgconfig_2.0.3 crayon_1.5.2
[17] fastmap_1.1.1 dbplyr_2.4.0
[19] XVector_0.42.0 ellipsis_0.3.2
[21] utf8_1.2.4 Rsamtools_2.18.0
[23] promises_1.2.1 tzdb_0.4.0
[25] bit_4.0.5 zlibbioc_1.48.2
[27] cachem_1.0.8 progress_1.2.3
[29] blob_1.2.4 later_1.3.2
[31] rhdf5filters_1.14.1 Rhdf5lib_1.24.2
[33] interactiveDisplayBase_1.40.0 prettyunits_1.2.0
[35] parallel_4.3.2 R6_2.5.1
[37] RColorBrewer_1.1-3 stringi_1.8.3
[39] limma_3.58.1 rtracklayer_1.62.0
[41] Rcpp_1.0.12 iterators_1.0.14
[43] R.utils_2.12.3 readr_2.1.5
[45] splines_4.3.2 httpuv_1.6.14
[47] tidyselect_1.2.0 rstudioapi_0.15.0
[49] yaml_2.3.8 codetools_0.2-19
[51] curl_5.2.0 doRNG_1.8.6
[53] plyr_1.8.9 regioneR_1.34.0
[55] lattice_0.21-9 tibble_3.2.1
[57] shiny_1.8.0 KEGGREST_1.42.0
[59] BiocFileCache_2.10.2 xml2_1.3.6
[61] Biostrings_2.70.3 pillar_1.9.0
[63] BiocManager_1.30.22 filelock_1.0.3
[65] rngtools_1.5.2 generics_0.1.3
[67] RCurl_1.98-1.14 ggplot2_3.5.0
[69] hms_1.1.3 BiocVersion_3.18.1
[71] sparseMatrixStats_1.14.0 munsell_0.5.0
[73] scales_1.3.0 bumphunter_1.44.0
[75] gtools_3.9.5 xtable_1.8-4
[77] glue_1.7.0 tools_4.3.2
[79] AnnotationHub_3.10.0 BiocIO_1.12.0
[81] data.table_1.15.0 BSgenome_1.70.2
[83] locfit_1.5-9.8 GenomicAlignments_1.38.2
[85] XML_3.99-0.16.1 grid_4.3.2
[87] AnnotationDbi_1.64.1 colorspace_2.1-0
[89] nlme_3.1-163 GenomeInfoDbData_1.2.11
[91] restfulr_0.0.15 annotatr_1.28.0
[93] cli_3.6.2 rappdirs_0.3.3
[95] fansi_1.0.6 dplyr_1.1.4
[97] gtable_0.3.4 outliers_0.15
[99] R.methodsS3_1.8.2 digest_0.6.34
[101] rjson_0.2.21 memoise_2.0.1
[103] htmltools_0.5.7 R.oo_1.26.0
[105] lifecycle_1.0.4 httr_1.4.7
[107] statmod_1.5.0 mime_0.12
[109] bit64_4.0.5
Hi Peter, can you please send me your email address to share it privately?
Thanks for your quick answer!
Greetings,
Juan
Sure, it's actually available in the
DESCRIPTION
file that is included in bsseq: https://github.com/hansenlab/bsseq/blob/d32c2c6709fac68b59578fac79d0eda385104585/DESCRIPTION#L10