Search
Question: R: error while reading .h5 files from R using rdhf5 package
0
gravatar for him4u324
4 months ago by
him4u3240
him4u3240 wrote:

I am new to hdf5 files. Trying to read some sample files from the below URL..https://support.hdfgroup.org/ftp/HDF5/examples/files/exbyapi/

while trying to reading one of the .h5 files in R environment

library(rhdf5)
h5ls("h5ex_d_sofloat.h5")

I am getting the below error...

Error in H5Fopen(file, "H5F_ACC_RDONLY") : HDF5. File accessability. Unable to open file.

 

ADD COMMENTlink modified 4 months ago by Wolfgang Huber13k • written 4 months ago by him4u3240

Is the file in your current directory?

ADD REPLYlink written 4 months ago by WouterDeCoster100

Yes. File is in my current directory.

ADD REPLYlink written 4 months ago by him4u3240

Can you update your post to include the output from the command sessionInfo() so we can see what version of R and rhdf5 you are using?  It seems to work fine for me e.g.

library(rhdf5)
##create temp file location and download to there
file_loc <- file.path(tempdir(), "h5ex_d_sofloat.h5")
download.file(url = "https://support.hdfgroup.org/ftp/HDF5/examples/files/exbyapi/h5ex_d_sofloat.h5",
              destfile = file_loc)
> h5ls(file_loc)
  group name       otype dclass     dim
0     /  DS1 H5I_DATASET  FLOAT 64 x 32

My sessionInfo is:

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 18.1

Matrix products: default
BLAS: /home/msmith/Applications/R/R-3.4.0/lib/libRblas.so
LAPACK: /home/msmith/Applications/R/R-3.4.0/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rhdf5_2.21.1

loaded via a namespace (and not attached):
[1] zlibbioc_1.23.0 compiler_3.4.0  tools_3.4.0 
ADD REPLYlink modified 4 months ago • written 4 months ago by Mike Smith2.1k
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252    LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C                   LC_TIME=English_India.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.7.0  rhdf5_2.18.0

loaded via a namespace (and not attached):
[1] zlibbioc_1.20.0  magrittr_1.5     assertthat_0.2.0 R6_2.2.2         tools_3.3.3     
[6] glue_1.1.0       tibble_1.3.3     Rcpp_0.12.11     rlang_0.1.1 
ADD REPLYlink written 4 months ago by him4u3240
1

Thanks for that.  I get the same error when running this with the latest versions on Windows.  I'll take a look and try to figure out what is going wrong.

ADD REPLYlink written 4 months ago by Mike Smith2.1k
6
gravatar for Mike Smith
4 months ago by
Mike Smith2.1k
EMBL Heidelberg / de.NBI
Mike Smith2.1k wrote:

I think this is related to how Windows treats binary and text files when you download them.  I don't know how you obtained the file, but it is the source of the error in my example code.  I'll try and demonstrate it here.


First we'll load the library and set the download URL

library(rhdf5)
file_url <- "http://support.hdfgroup.org/ftp/HDF5/examples/files/exbyapi/h5ex_d_sofloat.h5"

Now we'll download the file twice.  The first time treating it as a text file, the second as binary.  This is specified by the mode argument.

h5_text_dl <- file.path(tempdir(), "h5.text.h5")
download.file(url = file_url,
              destfile = h5_text_dl, 
              mode = "w")

h5_binary_dl <- file.path(tempdir(), "h5.binary.h5")
download.file(url = file_url,
              destfile = h5_binary_dl, 
              mode = "wb")

The output that is printed to screen is identical, so I'll include it only once.  Note that the file is of length 8072 bytes.

trying URL 'http://support.hdfgroup.org/ftp/HDF5/examples/files/exbyapi/h5ex_d_sofloat.h5'
Content type ' â³7û' length 8072 bytes
downloaded 8072 bytes

Now we'll do two operations on the downloaded files; We'll ask for the size of the file, and then we'll try to list the contents.  First the 'text' version:

> file.size(h5_text_dl)
[1] 8137
> h5ls(h5_text_dl)
 Error in H5Fopen(file, "H5F_ACC_RDONLY") : 
  HDF5. File accessability. Unable to open file. 

Now the binary version:

> file.size(h5_binary_dl)
[1] 8072
> h5ls(h5_binary_dl)
  group name       otype dclass     dim
0     /  DS1 H5I_DATASET  FLOAT 64 x 32

The text version is not the same size as the original download (Windows has done something!), and it is no longer a valid HDF5 file, hence the error message.  With the binary download the file stays intact and can be read.


This doesn't happen on Linux or Mac, but on Windows it's important to set the mode argument correctly.  I've no idea if this also applies to other methods of downloading files, but this was the root cause of my problem.

ADD COMMENTlink modified 4 months ago • written 4 months ago by Mike Smith2.1k

Thanks so much Mike. Its working now without any issues. Didn't expect the help so quickly. Thank you. :)

ADD REPLYlink written 4 months ago by him4u3240
0
gravatar for Wolfgang Huber
4 months ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:

Great detective work by Mike.
For background, perhaps this is related to the infamous LF vs CR + LF issue (see e.g. https://winscp.net/eng/docs/faq_line_breaks)

ADD COMMENTlink modified 4 months ago • written 4 months ago by Wolfgang Huber13k

I'm pretty sure that's exactly the issue.  I've written a little more about it in a blog post here: http://www.msmith.de/2017/06/23/download-file/

ADD REPLYlink written 4 months ago by Mike Smith2.1k

"Given this is an explicit design choice it would be nice if the library suggested that as a reason for the failure, rather than simply reporting it was unable to open the file, but you can’t have everything."

Could h5ls, h5read etc. in the rhdf5 package catch the problem and emit a helpful error message?

 

ADD REPLYlink written 4 months ago by Wolfgang Huber13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 130 users visited in the last hour