Question

rhdf5 package and arrays in R

0

Entering edit mode

FabioF • 0

@fabiof-7330

Last seen 10.1 years ago

United States

Hello,

I am using the package rhdf5 to build a large h5 with climate data for a specific geographic domain.

Domain has a dimension of 48x47 (lonxlat) points in space. Climate variables (precipitation, temperature...) are organized in a matrix of 2256 rows (48*47=2256) and 248 columns (8 observation/day for a 31 days month).

In order to satisfy the requirements of the destination model, I need to structure the h5 dataset in the form (time, lon, lat) (248,48,47). That for, I transformed the matrix of the observations in an array of dimension c(48,47,248) (lon,lat, time) and then used the command 'aperm' to switch the order of the dimensions.

However, when I write the dataset in the h5 file, I get the following message: "Writing of this type of data not supported."

Here the code I am using:

# load package from bioconductor
require(rhdf5)

setwd("path/to/file")

lon <-read.csv("lon_h5.csv", header=FALSE)
lon <-as.matrix(lon) #matrix 48x47
lat <-read.csv("lat_h5.csv", header=FALSE)
lat<-as.matrix(lat) #matrix 48x47

h5createFile("file.h5")
h5createDataset("file.h5", "lon",c(48,47), storage.mode = "double")
h5createDataset("file.h5", "lat",c(48,47), storage.mode = "double")
h5write(lon, file="file.h5", name="lon")
h5write(lat, file="file.h5", name="lat")

tmp <-read.csv(file="temperature.csv", header=TRUE)
tmp = array(tmp,dim=c(48,47,248)) # it loops the 48 longitude points first, then the 47 latitude points, then 248 time steps
tmp = aperm(a=tmp,perm=c(3,1,2)) # switch the order of the dimensions, putting time first, then longitude, then latitude
h5createDataset("file.h5", "tmp",c(248,48,47), storage.mode = "double")
h5write(tmp, file="file.h5", name="tmp")

'Writing of this type of data not supported.'

The array has 559488 elements (48*47*248), so it should not be a problem of dimensions.

There is no problem when I write a matrix, as for the lon and lat matrices. Does anybody know if the package rhdf5 has problems with arrays?

The storage mode of the arrays is 'list', is this the problem? how to deal with that?

Thank you

array R rhdf5 arrays h5 • 3.5k views

ADD COMMENT • link updated 10.1 years ago by Nathaniel Hayden ▴ 180 • written 10.1 years ago by FabioF • 0

score 0 · Answer 1 · 2015-02-04

0

Entering edit mode

Julian Gehring ★ 1.3k

@julian-gehring-5818

Last seen 5.8 years ago

When you import your data with read.csv, it gets stored as a data.frame. This is where the storage.mode == list comes from. You can change the storage mode with

    storage.mode(tmp) = "double"

then h5write will succeed

ADD COMMENT • link 10.1 years ago Julian Gehring ★ 1.3k

0

Entering edit mode

Hi Julian, thanks a lot for your reply.

I tried to change the storage mode as you suggested, but the message is:

Error in storage.mode(tmp) = "double" :

(list) object cannot be coerced to type 'double'

Checking around I found another possible way to change the storage mode
tmp<-as.numeric(unlist(tmp))

but this would change the dimensions of my array from 559488 to 1262204908 (don't ask me why...) messing up everything.

ADD REPLY • link 10.1 years ago FabioF • 0

0

Entering edit mode

Hi, Fabio. I think there's something not quite right with the coercion you attempt when you call array on tmp returned by read.csv. Can you verify that tmp = array(tmp,dim=c(48,47,248)) really gives the array you want? You can see from ?array that array expects a vector as its first argument. A data.frame (the type returned from read.csv) is represented by a named list with rownames and dimensions attributes, so passing that to array might be part of the problem. data.matrix will convert a data.frame to a numeric matrix.

However you do it, I think you need to start by getting tmp out of it's list-based (data.frame) representation. So you were on the right track with tmp <- as.numeric(unlist(tmp)): this will simply concatenate the contents of each of the columns of the data.frame in one long vector, and from there you can call array to assign the dimension you want. Be sure to verify the elements ended up in the right place! data.matrix, on the other hand, will keep the columns for you. See ?data.matrix.

ADD REPLY • link 10.1 years ago Nathaniel Hayden ▴ 180

0

Entering edit mode

Nate and Julian, thanks a lot for this.

The problem was there indeed. I didn't manage before just because I was trying to unlist tmp after transforming it in an array.

So, this is the code that finally worked:

    tmp <-read.csv(file="temperature.csv", header=TRUE)
    tmp=as.matrix((tmp), rownames=FALSE) 
    tmp = array(tmp,dim=c(48,47,248)) 
    tmp = aperm(a=tmp,perm=c(3,1,2)) 
    h5write(tmp, file="file.h5", name="tmp")

The array looks exactly the same, but the storage mode goes from 'list' to 'double'.

Thanks a lot for your help

Fabio

ADD REPLY • link 10.0 years ago FabioF • 0

score 0 · Answer 2 · 2015-02-04

Just reference, rhdf5 does not have problems with arrays:

> library(rhdf5)
> a = array(seq(1.1, 24.1, 1.0), c(2, 3, 4))
> h5fl <- tempfile()
> h5createFile(h5fl)
[1] TRUE
> h5createDataset(h5fl, "a", c(2, 3, 4), storage.mode="double")
[1] TRUE
> h5write(a, h5fl, "a")
> h5dump(h5fl)
$a
, , 1
     [,1] [,2] [,3]
[1,]  1.1  3.1  5.1
[2,]  2.1  4.1  6.1

, , 2
     [,1] [,2] [,3]
[1,]  7.1  9.1 11.1
[2,]  8.1 10.1 12.1

, , 3
     [,1] [,2] [,3]
[1,] 13.1 15.1 17.1
[2,] 14.1 16.1 18.1

, , 4
     [,1] [,2] [,3]
[1,] 19.1 21.1 23.1
[2,] 20.1 22.1 24.1

Note as a shortcut you can also directly use h5write (without first explicitly creating a dataset) and it will usually do the right thing:

> h5write(a, h5fl, "a2")
> h5ls(h5fl)
  group name       otype dclass       dim
0     /    a H5I_DATASET  FLOAT 2 x 3 x 4
1     /   a2 H5I_DATASET  FLOAT 2 x 3 x 4