Generating a DelayedArray with a size greater than .Machine$integer.max
0
0
Entering edit mode
Koki • 0
@koki-7888
Last seen 1 day ago
Japan

When the DelayedArray is extreamly huge, ReshapedHDF5Array did not work as shown below.

I think the reason is that inside the R language, integers greater than 2^31-1 cannot be handled.

How about making ReshapedHDF5Array accept bit64's integer64? cf. https://stackoverflow.com/questions/14589354/struggling-with-integers-maximum-integer-size

library("HDF5Array")
library("DelayedRandomArray")
library("bit64")

# This does work (2^31-1 > 1000*1000*1000)
dim1 <- c(1000,1000,1000)
l1 <- as.integer64(prod(dim1))

darr1 <- RandomBinomArray(dim=dim1, size=1, prob=0.2)
tmpfile1 <- paste0(tempfile(), ".h5")
writeHDF5Array(darr1, tmpfile1, "tmp")
out1 <- ReshapedHDF5Array(tmpfile1, "tmp", l1)

# This does not work (2^31-1 < 2000*2000*1000)
dim2 <- c(2000,2000,1000)
l2 <- as.integer64(prod(dim2))

darr2 <- RandomBinomArray(dim=dim2, size=1, prob=0.2)
tmpfile2 <- paste0(tempfile(), ".h5")
writeHDF5Array(darr2, tmpfile2, "tmp")
out2 <- ReshapedHDF5Array(tmpfile2, "tmp", l2)
# Error in DelayedArray:::normarg_dim(dim) :
#   'dim' cannot contain negative or NA values
# In addition: Warning message:
# In as.integer.integer64(dim) : NAs produced by integer overflow

sessionInfo( )
# R version 4.1.0 (2021-05-18)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 20.04.2 LTS
# 
# Matrix products: default
# BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
#
# locale:
# [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
# [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
# [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
# [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
# [9] LC_ADDRESS=C               LC_TELEPHONE=C
# [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#
# attached base packages:
# [1] parallel  stats4    stats     graphics  grDevices utils     datasets
# [8] methods   base
#
# other attached packages:
# [1] bit64_4.0.5              bit_4.0.4                DelayedRandomArray_1.1.0
# [4] HDF5Array_1.21.0         rhdf5_2.37.0             DelayedArray_0.19.0
# [7] IRanges_2.27.0           S4Vectors_0.31.0         MatrixGenerics_1.5.0
# [10] matrixStats_0.58.0       BiocGenerics_0.39.0      Matrix_1.3-3
#
# loaded via a namespace (and not attached):
# [1] Rcpp_1.0.6         lattice_0.20-44    rhdf5filters_1.5.0 grid_4.1.0
# [5] dqrng_0.3.0        Rhdf5lib_1.15.0    tools_4.1.0        compiler_4.1.0

Koki

bit64 HDF5Array DelayedArray • 60 views
ADD COMMENT
0
Entering edit mode

I think this is not a trivial problem, but a problem that is bound to occur at some point when handling some huge array data with DelayedArray. For example, the size of the commonly used 1.3 mouse brain data of 10X Chromium is 27998 genes × 1306127 cell = 3.6 * 1E+10 elements and I think something will go wrong when we write the data of such size into a DelayedArray using block processing. https://bioconductor.org/packages/release/data/experiment/html/TENxBrainData.html

In the first place, why does DelayedArray require the user to specify an integer? I have written a lot of codes like "1L" or "as.integer", but I think it would be better to accept numeric as input and change it to integer inside the DelayedArray. I think we'll have less trouble if we do it that way. I also hope that if the value is greater than 2^31-1, change it to integer64.

ADD REPLY

Login before adding your answer.

Traffic: 320 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6