Error: C stack usage of HDF5Array
1
0
Entering edit mode
Koki ▴ 10
@koki-7888
Last seen 5 months ago
Japan

I tried to create a very huge three-dimensional array.

It cannot be created by standard array due to the out-of-memory error, so I used HDF5Array but the writing step stopped because of C stack error.

Is this because that I have to set something special setting when using HDF5Array?

I changed the setting of the C stack size by ulimit -s but the situation remains the same.

Thank you in advance.

library("HDF5Array")

# cf. https://rdrr.io/bioc/DelayedArray/man/write_block.html
.sarray <- function(dim){
    dim <- as.integer(dim)
    setAutoRealizationBackend("HDF5Array")
    sink <- AutoRealizationSink(dim, as.sparse=TRUE)
    close(sink)
    as(sink, "DelayedArray")
}

human <- array(runif(13889*1977), dim=c(13889, 1977))
mouse <- array(runif(13889*1907), dim=c(13889, 1907))

new_modes <- c(ncol(human), ncol(mouse), nrow(human))
darr <- .sarray(new_modes)
for(i in seq(dim(darr)[3])){
    print(paste0(i, " / ", dim(darr)[3]))
    darr[,,i] <- outer(human[i,], mouse[i,])
}

# After several step (e.g. 90 / 13889)
# This calculation stops by the following error.
# Error: C stack usage  1947092 is too close to the limit

I'm using the devel version of R and Biconductor.

sessionInfo()
R Under development (unstable) (2021-03-18 r80099)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] rTensor_1.4.1        testthat_3.0.2       BiocSingular_1.7.2
 [4] HDF5Array_1.19.15    rhdf5_2.35.2         DelayedArray_0.17.11
 [7] IRanges_2.25.10      S4Vectors_0.29.17    MatrixGenerics_1.3.1
[10] matrixStats_0.58.0   BiocGenerics_0.37.4  Matrix_1.3-2
[13] BiocManager_1.30.12

loaded via a namespace (and not attached):
 [1] rprojroot_2.0.2     compiler_4.1.0      tools_4.1.0
 [4] rsvd_1.0.5          Rcpp_1.0.6          rhdf5filters_1.3.4
 [7] beachmat_2.7.7      irlba_2.3.3         desc_1.3.0
[10] ScaledMatrix_0.99.2 BiocParallel_1.25.5 rlang_0.4.10
[13] lattice_0.20-41     Rhdf5lib_1.13.4     magrittr_2.0.1
[16] R6_2.5.0            withr_2.4.2         crayon_1.4.1
[19] grid_4.1.0
DelayedArray HDF5Array • 1.1k views
ADD COMMENT
1
Entering edit mode
@herve-pages-1542
Last seen 19 hours ago
Seattle, WA, United States

Hi,

You never wrote anything to your RealizationSink object!

Here is why:

  • Your .sarray() function creates the RealizationSink object, then immediately closes it (without writing anything to it), and finally turns the sink object into a DelayedArray object.
  • Then you try to modify the DelayedArray in a loop with darr[,,i] <- .... But it's too late! These modifications are treated as delayed operations that are piling up on the DelayedArray object, and not as write operations to the HDF5 file.
  • It's important to keep in mind that a DelayedArray object always treats the file that it's pointing at as read-only. In other words, you can never modify the file by operating on the DelayedArray object.

As explained in the man page for write_block and RealizationSink objects, things must be done in the following order:

  1. Create the realization sink.

  2. Write blocks of array data to the realization sink with one or several calls to write_block().

  3. Close the realization sink with close().

  4. Coerce the realization sink to DelayedArray.

Note that you must use write_block() to write data to the realization sink.

Please check the examples in the man page. They have plenty of comments that cover many details.

Best,

H.

ADD COMMENT
0
Entering edit mode

Ok, I didn't really understand the documentation because it was a little technical for me, but I think I finally got it.

I see that I need to align the block of the on-memory object with the block to be written to HDF5.

library("HDF5Array")

human <- as(array(runif(13889*1977), dim=c(13889, 1977)), "HDF5Array")
mouse <- as(array(runif(13889*1907), dim=c(13889, 1907)), "HDF5Array")

new_modes <- as.integer(c(1977, 1907, 13889))
human_grid <- rowAutoGrid(human, nrow=30)
mouse_grid <- rowAutoGrid(mouse, nrow=30)

setAutoRealizationBackend("HDF5Array")
sink <- AutoRealizationSink(new_modes)
sink_grid <- RegularArrayGrid(dim(sink), spacings=c(1977, 1907, 30))

stopifnot(length(sink_grid) == length(human_grid))
stopifnot(length(sink_grid) == length(mouse_grid))

block_outer <- function(A, B){
    stopifnot(nrow(A) == nrow(B))
    arr <- array(0, dim=c(ncol(A), ncol(B), nrow(A)))
    for(i in seq(dim(arr)[3])){
        arr[,,i] <- outer(A[i,], B[i, ])
    }
    arr
}

FUN <- function(viewport, sink) {
    bid <- currentBlockId()
    human_block <- read_block(human, human_grid[[bid]])
    mouse_block <- read_block(mouse, mouse_grid[[bid]])
    block <- block_outer(human_block, mouse_block)
    write_block(sink, viewport, block)
}
system.time(sink <- gridReduce(FUN, sink_grid, sink, verbose=TRUE))

close(sink)
M <- as(sink, "DelayedArray")
dim(M)
M[1:2,1:2,1:2]
ADD REPLY

Login before adding your answer.

Traffic: 880 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6