Reading Strings from MAT Files using rhdf5
1
0
Entering edit mode
ejb6 • 0
@ejb6-7520
Last seen 6.6 years ago
United States

I am trying to use rhdf5 to read data from MAT files (generated in MATLAB 2013a using v7.3 of the MAT file format) in R. For the most part it works great, but I'm getting some strange behavior trying to load strings. In MATLAB, I create a 596x55 char array, x. I then create a second variable, y = x(1:end-1,:), which just contains the first 595 rows of x. I then save both to a file:

save z:/temp/test73.mat -v7.3 x y

I can load the file in MATLAB without any issues.

>> load('z:/temp/test73.mat', 'x')
>> x(1,:)

ans =

GRUPO AEROPORT DEL PACIFIC-B

>> y(1,:)

ans =

GRUPO AEROPORT DEL PACIFIC-B               

In R, I am able to load y from the file without any issues:

> values <- h5read("z:/temp/test73.mat", "/y")
> values[1,]
[1] 71 82 85 80 79 32 65 69 82 79 80 79 82 84 32 68 69 76 32 80 65 67 73 70 73 67 45 66 32 32 32 32 32 32 32 32
[37] 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32
> nc <- ncol(values)
>     mode(values) <- "raw"
>     values <- rawToChar(t(values))
>     values <- substring(values, seq(1, nchar(values)-1, nc), seq(nc, nchar(values), nc))
>     values <- gsub(" *\$", "", values)
> values[1]
[1] "GRUPO AEROPORT DEL PACIFIC-B"

However, I am unable to load x:

> values <- h5read("z:/temp/test73.mat", "/x")
> values[1,]
[1] 71 85 79 66 82 78 79 69 32 79 71 82 73 70 32 82 32 32 32 69 48 32 79 68 32 32 32 32 32 32 32 32 32 32 32 32
[37] 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32
> nc <- ncol(values)
>     mode(values) <- "raw"
Warning message:
In eval(expr, envir, enclos) :
out-of-range values treated as 0 in coercion to raw
>     values <- rawToChar(t(values))
Error in rawToChar(t(values)) :
embedded nul in string: 'GUOBRNOE OGRIF R   E0 OD                               GUOOANOHBO HC A ROPN   U            R                  GUOUOIO EFT RC SL HT                                   GUO UAOEIADCC WM L N D                                 GUO INONROI AAA B CLHST-                               GUR INO(YCACT  C  TDC        O                         GUR LNOOTOIOI-N  AONNA G                               GUTULNE C RRAA THCNNC Y  O                             GUYOLOUOSCITLSCIG PP        E                          GRY LOAO CEEISECOPIR     B                             GRC GOHOACOAR   OACLD-       R  N     1                GRK MOHOACIAE   O CR I C A          R                  GRT ARIORATSI   TN  L               R                  GR  HNIORAUS   /  LRC  C                               GRANWRIORAINUA ALIN  L R    U                          GRA.ROIOU HUSA  SE SC KT        U                      GRR EXIORNA N R LD S C E                               GRR0LRI QH  P   R  -L
>     values <- substring(values, seq(1, nchar(values)-1, nc), seq(nc, nchar(values), nc))
Error in seq.default(1, nchar(values) - 1, nc) : 'to' must be of length 1



The output of sessionInfo() is:

> sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] rhdf5_2.6.0

loaded via a namespace (and not attached):
[1] tools_3.0.3    zlibbioc_1.8.0

I would appreciate any help in solving this.

Thanks.

- Elliot

rhdf5 characters • 2.3k views
0
Entering edit mode

I'm wondering if there are encoding issues. raw type, as one might predict, expects everything to be in byte range, i.e., 0 <= x <= 255.

Try this in an R session:

val <- as.raw(c(0, -21, 42, 255, 256))
# same warning you get
as.raw(val)
# same error you get

And the man page for rawToChar (?rawToChar) warns about encoding issues, and says only trailing nuls are allowed.

Does which(values < 0 & values > 255) return anything?

0
Entering edit mode
Bernd Fischer ▴ 540
@bernd-fischer-5348
Last seen 4.8 years ago
Germany / Heidelberg / DKFZ

What is the HDF5 data-type of x and y? What is the data space?

Can you provide the dataset test73.mat. Because otherwise, we cannot completely reproduce your error. It is impossible to say, if the error occurs during reading the data with rhdf5 or if the error is in your interpretation of the data (Conversion with raw-mode).

Bernd