Reading Strings from MAT Files using rhdf5
1
0
Entering edit mode
ejb6 • 0
@ejb6-7520
Last seen 9.0 years ago
United States

I am trying to use rhdf5 to read data from MAT files (generated in MATLAB 2013a using v7.3 of the MAT file format) in R. For the most part it works great, but I'm getting some strange behavior trying to load strings. In MATLAB, I create a 596x55 char array, x. I then create a second variable, y = x(1:end-1,:), which just contains the first 595 rows of x. I then save both to a file:

save z:/temp/test73.mat -v7.3 x y

I can load the file in MATLAB without any issues.

>> load('z:/temp/test73.mat', 'x')
>> x(1,:)

ans =

GRUPO AEROPORT DEL PACIFIC-B                           

>> load('z:/temp/test73.mat', 'y')
>> y(1,:)

ans =

GRUPO AEROPORT DEL PACIFIC-B               

In R, I am able to load y from the file without any issues:

> values <- h5read("z:/temp/test73.mat", "/y")
> values[1,]
 [1] 71 82 85 80 79 32 65 69 82 79 80 79 82 84 32 68 69 76 32 80 65 67 73 70 73 67 45 66 32 32 32 32 32 32 32 32
[37] 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32
> nc <- ncol(values)
>     mode(values) <- "raw"    
>     values <- rawToChar(t(values))
>     values <- substring(values, seq(1, nchar(values)-1, nc), seq(nc, nchar(values), nc))
>     values <- gsub(" *$", "", values)
> values[1]
[1] "GRUPO AEROPORT DEL PACIFIC-B"

However, I am unable to load x:

> values <- h5read("z:/temp/test73.mat", "/x")
> values[1,]
 [1] 71 85 79 66 82 78 79 69 32 79 71 82 73 70 32 82 32 32 32 69 48 32 79 68 32 32 32 32 32 32 32 32 32 32 32 32
[37] 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32
> nc <- ncol(values)
>     mode(values) <- "raw"    
Warning message:
In eval(expr, envir, enclos) :
  out-of-range values treated as 0 in coercion to raw
>     values <- rawToChar(t(values))
Error in rawToChar(t(values)) : 
  embedded nul in string: 'GUOBRNOE OGRIF R   E0 OD                               GUOOANOHBO HC A ROPN   U            R                  GUOUOIO EFT RC SL HT                                   GUO UAOEIADCC WM L N D                                 GUO INONROI AAA B CLHST-                               GUR INO(YCACT  C  TDC        O                         GUR LNOOTOIOI-N  AONNA G                               GUTULNE C RRAA THCNNC Y  O                             GUYOLOUOSCITLSCIG PP        E                          GRY LOAO CEEISECOPIR     B                             GRC GOHOACOAR   OACLD-       R  N     1                GRK MOHOACIAE   O CR I C A          R                  GRT ARIORATSI   TN  L               R                  GR  HNIORAUS   /  LRC  C                               GRANWRIORAINUA ALIN  L R    U                          GRA.ROIOU HUSA  SE SC KT        U                      GRR EXIORNA N R LD S C E                               GRR0LRI QH  P   R  -L         
>     values <- substring(values, seq(1, nchar(values)-1, nc), seq(nc, nchar(values), nc))
Error in seq.default(1, nchar(values) - 1, nc) : 'to' must be of length 1

The output of sessionInfo() is:

> sessionInfo()
R version 3.0.3 (2014-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rhdf5_2.6.0

loaded via a namespace (and not attached):
[1] tools_3.0.3    zlibbioc_1.8.0

I would appreciate any help in solving this.

Thanks.

- Elliot

rhdf5 characters • 3.5k views
ADD COMMENT
0
Entering edit mode

I'm wondering if there are encoding issues. raw type, as one might predict, expects everything to be in byte range, i.e., 0 <= x <= 255.

Try this in an R session:

val <- as.raw(c(0, -21, 42, 255, 256))
# same warning you get
as.raw(val)
# same error you get

And the man page for rawToChar (?rawToChar) warns about encoding issues, and says only trailing nuls are allowed.

Does which(values < 0 & values > 255) return anything?

ADD REPLY
0
Entering edit mode
Bernd Fischer ▴ 550
@bernd-fischer-5348
Last seen 7.3 years ago
Germany / Heidelberg / DKFZ

What is the HDF5 data-type of x and y? What is the data space?

Can you provide the dataset test73.mat. Because otherwise, we cannot completely reproduce your error. It is impossible to say, if the error occurs during reading the data with rhdf5 or if the error is in your interpretation of the data (Conversion with raw-mode).

Bernd

 

ADD COMMENT

Login before adding your answer.

Traffic: 504 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6