R: error while reading .h5 files from R using rdhf5 package
2
1
Entering edit mode
him4u324 ▴ 10
@him4u324-13313
Last seen 7.5 years ago

I am new to hdf5 files. Trying to read some sample files from the below URL..https://support.hdfgroup.org/ftp/HDF5/examples/files/exbyapi/

while trying to reading one of the .h5 files in R environment

library(rhdf5)
h5ls("h5ex_d_sofloat.h5")

I am getting the below error...

Error in H5Fopen(file, "H5F_ACC_RDONLY") : HDF5. File accessability. Unable to open file.

 

rhdf5 r hdf5 • 14k views
ADD COMMENT
0
Entering edit mode

Is the file in your current directory?

ADD REPLY
0
Entering edit mode

Yes. File is in my current directory.

ADD REPLY
0
Entering edit mode

Can you update your post to include the output from the command sessionInfo() so we can see what version of R and rhdf5 you are using?  It seems to work fine for me e.g.

library(rhdf5)
##create temp file location and download to there
file_loc <- file.path(tempdir(), "h5ex_d_sofloat.h5")
download.file(url = "https://support.hdfgroup.org/ftp/HDF5/examples/files/exbyapi/h5ex_d_sofloat.h5",
              destfile = file_loc)
> h5ls(file_loc)
  group name       otype dclass     dim
0     /  DS1 H5I_DATASET  FLOAT 64 x 32

My sessionInfo is:

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 18.1

Matrix products: default
BLAS: /home/msmith/Applications/R/R-3.4.0/lib/libRblas.so
LAPACK: /home/msmith/Applications/R/R-3.4.0/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rhdf5_2.21.1

loaded via a namespace (and not attached):
[1] zlibbioc_1.23.0 compiler_3.4.0  tools_3.4.0 
ADD REPLY
0
Entering edit mode
> sessionInfo()
R version 3.3.3 (2017-03-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_India.1252  LC_CTYPE=English_India.1252    LC_MONETARY=English_India.1252
[4] LC_NUMERIC=C                   LC_TIME=English_India.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.7.0  rhdf5_2.18.0

loaded via a namespace (and not attached):
[1] zlibbioc_1.20.0  magrittr_1.5     assertthat_0.2.0 R6_2.2.2         tools_3.3.3     
[6] glue_1.1.0       tibble_1.3.3     Rcpp_0.12.11     rlang_0.1.1 
ADD REPLY
1
Entering edit mode

Thanks for that.  I get the same error when running this with the latest versions on Windows.  I'll take a look and try to figure out what is going wrong.

ADD REPLY
6
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 6 hours ago
EMBL Heidelberg

I think this is related to how Windows treats binary and text files when you download them.  I don't know how you obtained the file, but it is the source of the error in my example code.  I'll try and demonstrate it here.


First we'll load the library and set the download URL

library(rhdf5)
file_url <- "http://support.hdfgroup.org/ftp/HDF5/examples/files/exbyapi/h5ex_d_sofloat.h5"

Now we'll download the file twice.  The first time treating it as a text file, the second as binary.  This is specified by the mode argument.

h5_text_dl <- file.path(tempdir(), "h5.text.h5")
download.file(url = file_url,
              destfile = h5_text_dl, 
              mode = "w")

h5_binary_dl <- file.path(tempdir(), "h5.binary.h5")
download.file(url = file_url,
              destfile = h5_binary_dl, 
              mode = "wb")

The output that is printed to screen is identical, so I'll include it only once.  Note that the file is of length 8072 bytes.

trying URL 'http://support.hdfgroup.org/ftp/HDF5/examples/files/exbyapi/h5ex_d_sofloat.h5'
Content type ' â³7û' length 8072 bytes
downloaded 8072 bytes

Now we'll do two operations on the downloaded files; We'll ask for the size of the file, and then we'll try to list the contents.  First the 'text' version:

> file.size(h5_text_dl)
[1] 8137
> h5ls(h5_text_dl)
 Error in H5Fopen(file, "H5F_ACC_RDONLY") : 
  HDF5. File accessability. Unable to open file. 

Now the binary version:

> file.size(h5_binary_dl)
[1] 8072
> h5ls(h5_binary_dl)
  group name       otype dclass     dim
0     /  DS1 H5I_DATASET  FLOAT 64 x 32

The text version is not the same size as the original download (Windows has done something!), and it is no longer a valid HDF5 file, hence the error message.  With the binary download the file stays intact and can be read.


This doesn't happen on Linux or Mac, but on Windows it's important to set the mode argument correctly.  I've no idea if this also applies to other methods of downloading files, but this was the root cause of my problem.

ADD COMMENT
0
Entering edit mode

Thanks so much Mike. Its working now without any issues. Didn't expect the help so quickly. Thank you. :)

ADD REPLY
0
Entering edit mode

Hi Mike,

I am using Mac and I also got the error message:

Error in H5Fopen(file, flags = flags, fapl = fapl, native = native) : 
  HDF5. File accessibility. Unable to open file.

Is this also the same problem with file type? But how to solve this problem? I got the h5 files directly from Kallisto aligner, so I do not know how to change the mode.

Look forward to your reply!

ADD REPLY
0
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…

Great detective work by Mike.
For background, perhaps this is related to the infamous LF vs CR + LF issue (see e.g. https://winscp.net/eng/docs/faq_line_breaks)

ADD COMMENT
0
Entering edit mode

I'm pretty sure that's exactly the issue.  I've written a little more about it in a blog post here: http://www.msmith.de/2017/06/23/download-file/

ADD REPLY
0
Entering edit mode

"Given this is an explicit design choice it would be nice if the library suggested that as a reason for the failure, rather than simply reporting it was unable to open the file, but you can’t have everything."

Could h5ls, h5read etc. in the rhdf5 package catch the problem and emit a helpful error message?

 

ADD REPLY

Login before adding your answer.

Traffic: 631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6