Hello,
I am using the package rhdf5 to build a large h5 with climate data for a specific geographic domain.
Domain has a dimension of 48x47 (lonxlat) points in space. Climate variables (precipitation, temperature...) are organized in a matrix of 2256 rows (48*47=2256) and 248 columns (8 observation/day for a 31 days month).
In order to satisfy the requirements of the destination model, I need to structure the h5 dataset in the form (time, lon, lat) (248,48,47). That for, I transformed the matrix of the observations in an array of dimension c(48,47,248) (lon,lat, time) and then used the command 'aperm' to switch the order of the dimensions.
However, when I write the dataset in the h5 file, I get the following message: "Writing of this type of data not supported."
Here the code I am using:
# load package from bioconductor
require(rhdf5)
setwd("path/to/file")
lon <-read.csv("lon_h5.csv", header=FALSE)
lon <-as.matrix(lon) #matrix 48x47
lat <-read.csv("lat_h5.csv", header=FALSE)
lat<-as.matrix(lat) #matrix 48x47
h5createFile("file.h5")
h5createDataset("file.h5", "lon",c(48,47), storage.mode = "double")
h5createDataset("file.h5", "lat",c(48,47), storage.mode = "double")
h5write(lon, file="file.h5", name="lon")
h5write(lat, file="file.h5", name="lat")
tmp <-read.csv(file="temperature.csv", header=TRUE)
tmp = array(tmp,dim=c(48,47,248)) # it loops the 48 longitude points first, then the 47 latitude points, then 248 time steps
tmp = aperm(a=tmp,perm=c(3,1,2)) # switch the order of the dimensions, putting time first, then longitude, then latitude
h5createDataset("file.h5", "tmp",c(248,48,47), storage.mode = "double")
h5write(tmp, file="file.h5", name="tmp")
'Writing of this type of data not supported.'
The array has 559488 elements (48*47*248), so it should not be a problem of dimensions.
There is no problem when I write a matrix, as for the lon and lat matrices. Does anybody know if the package rhdf5 has problems with arrays?
The storage mode of the arrays is 'list', is this the problem? how to deal with that?
Thank you
Hi Julian, thanks a lot for your reply.
I tried to change the storage mode as you suggested, but the message is:
Error in storage.mode(tmp) = "double" :
Hi, Fabio. I think there's something not quite right with the coercion you attempt when you call
array
ontmp
returned byread.csv
. Can you verify thattmp = array(tmp,dim=c(48,47,248))
really gives the array you want? You can see from?array
that array expects a vector as its first argument. A data.frame (the type returned from read.csv) is represented by a named list with rownames and dimensions attributes, so passing that toarray
might be part of the problem.data.matrix
will convert adata.frame
to a numeric matrix.However you do it, I think you need to start by getting tmp out of it's list-based (data.frame) representation. So you were on the right track with
tmp <- as.numeric(unlist(tmp))
: this will simply concatenate the contents of each of the columns of the data.frame in one long vector, and from there you can call array to assign the dimension you want. Be sure to verify the elements ended up in the right place!data.matrix
, on the other hand, will keep the columns for you. See?data.matrix
.Nate and Julian, thanks a lot for this.
The problem was there indeed. I didn't manage before just because I was trying to unlist tmp after transforming it in an array.
So, this is the code that finally worked:
The array looks exactly the same, but the storage mode goes from 'list' to 'double'.
Thanks a lot for your help
Fabio