Temporal directory setting for DelayedArray
1
0
Entering edit mode
Koki ▴ 10
@koki-7888
Last seen 15 days ago
Japan

I implemented some functions based on block processing, referring to the write_block documentation.

https://rdrr.io/bioc/DelayedArray/man/write_block.html

However, since there is a limit to the capacity of /tmp directory, there is a risk of running out of disk space when handling large data. Therefore, I would like to know if there is a way to optionally set the directory globally in advance and not let /tmp be used.

It seems that the writeHDF5Array can be specified with the filepath option, but is there any way to specify the filepath in the DelayedArray's functions such as setAutoRealizationBackend("HDF5Array") and AutoRealizationSink?

For example, I thought I could use the path function to change the directory, but I couldn't.

library("DelayedArray")
library("HDF5Array")
B3 <- array(runif(2*3*4), dim=c(2,3,4))
B3 <- as(B3, "HDF5Array")
path(B3) <- "temp.h5" # Error

HDF5Array DelayedArray • 211 views
0
Entering edit mode

I imagined the change of tempdir will change the temporal directory inside DelayedArray so I installed unixtools package and used the set.tempdir function but it didn't work.

remotes::install_github("s-u/unixtools")

library("DelayedTensor")
library("HDF5Array")
library("unixtools")

tmpdir <- paste(sample(c(letters,1:9), 10), collapse="")
dir.create(tmpdir)
set.tempdir(tmpdir)
tempdir()

B3 <- array(runif(2*3*4), dim=c(2,3,4))
B3 <- as(B3, "HDF5Array") # frozen

3
Entering edit mode
Mike Smith ★ 5.2k
@mike-smith
Last seen 10 hours ago
EMBL Heidelberg / de.NBI

I think you need to look at setHDF5DumpDir() from the HDF5Array package.

Here's a little exploration of what happens when you use that function:

## define a location
hdf5_temp_dir <- "/tmp/testing/HDF5Array"
## the location must exist for HDF5Array to use it
dir.create(hdf5_temp_dir, recursive = TRUE)
## check it's empty
list.files(hdf5_temp_dir)
#> character(0)

setHDF5DumpDir(hdf5_temp_dir)
## setting this seems to create an empty H5 file automatically
list.files(hdf5_temp_dir)
#> [1] "auto00001.h5"
dump_file <- getHDF5DumpFile()
dump_file
#> [1] "/tmp/testing/HDF5Array/auto00001.h5"
file.size( dump_file )
#> [1] 800

B3 <- array(runif(2*3*4), dim=c(2,3,4))
B3 <- as(B3, "HDF5Array")

## now the temporary dump file is bigger as we've written something
## and HDF5Array is set to use another file for the next operation
file.size( dump_file )
#> [1] 6725
getHDF5DumpFile()
#> [1] "/tmp/testing/HDF5Array/auto00002.h5"

1
Entering edit mode

Thank you Mike Smith ! This is what I was looking for. I've confirmed that when running DelayedArray, the intermediate HDF5 files are output to the directory I set, not /tmp.