I'm in the process of writing a few wrappers for loading and writing out files in the biom-format that happens to be in HDF5 format. The rhdf5 package is great, but in particular, the beginning of every file (as an example:https://github.com/biocore/biom-format/blob/master/examples/rich_sparse_otu_table_hdf5.biom ) has missing information that I can get running the command-line version of hdf5dump
Running hdf5dump vs. 1.8.7 I'm able to see creation-date, format-url, format-version, etc (see below).
However, running h5read/ls/dump on the same object none of these categories/groups come up. My goal is to get the format-verson, etc groups that are not showing up.
In particular rhdf5::hdf5dump function does not open the file unlike hdf5dump through terminal.
Example:
# in R
> h5dump("~/Desktop/rich_sparse_otu_table_hdf5.biom") HDF5-DIAG: Error detected in HDF5 (1.8.7) thread 0: #000: H5F.c line 1522 in H5Fopen(): unable to open file major: File accessability minor: Unable to open file #001: H5F.c line 1211 in H5F_open(): unable to open file: time = Tue Oct 28 00:27:02 2014 , name = '~/Desktop/rich_sparse_otu_table_hdf5.biom', tent_flags = 1 major: File accessability minor: Unable to open file #002: H5FD.c line 1086 in H5FD_open(): open failed major: Virtual File Layer minor: Unable to initialize object #003: H5FDsec2.c line 348 in H5FD_sec2_open(): unable to open file: name = '~/Desktop/rich_sparse_otu_table_hdf5.biom', errno = 2, error message = 'No such file or directory', flags = 1, o_flags = 2 major: File accessability minor: Unable to open file HDF5: unable to open file Error in h5checktypeOrOpenLoc(file) : Error in h5checktypeOrOpenLoc(). File '~/Desktop/rich_sparse_otu_table_hdf5.biom' is not a valid HDF5 file.
str(h5read("./rich_sparse_otu_table_hdf5.biom","/")) List of 2 $ observation:List of 4 ..$ group-metadata: NULL ..$ ids : chr [1:5(1d)] "GG_OTU_1" "GG_OTU_2" "GG_OTU_3" "GG_OTU_4" ... ..$ matrix :List of 3 .. ..$ data : num [1:15(1d)] 1 5 1 2 3 1 1 4 2 2 ... .. ..$ indices: int [1:15(1d)] 2 0 1 3 4 5 2 3 5 0 ... .. ..$ indptr : int [1:6(1d)] 0 1 6 9 13 15 ..$ metadata :List of 1 .. ..$ taxonomy: chr [1:7, 1:5] "k__Bacteria" "p__Proteobacteria" "c__Gammaproteobacteria" "o__Enterobacteriales" ... $ sample :List of 4 ..$ group-metadata: NULL ..$ ids : chr [1:6(1d)] "Sample1" "Sample2" "Sample3" "Sample4" ... ..$ matrix :List of 3 .. ..$ data : num [1:15(1d)] 5 2 1 1 1 1 1 1 1 2 ... .. ..$ indices: int [1:15(1d)] 1 3 1 3 4 0 2 3 4 1 ... .. ..$ indptr : int [1:7(1d)] 0 2 5 9 11 12 15 ..$ metadata :List of 4 .. ..$ BODY_SITE : chr [1:6(1d)] "gut" "gut" "gut" "skin" ... .. ..$ BarcodeSequence : chr [1:6(1d)] "CGCTTATCGAGA" "CATACCAGTAGC" "CTCTCTACCTGT" "CTCTCGGCCTGT" ... .. ..$ Description : chr [1:6(1d)] "human gut" "human gut" "human gut" "human skin" ... .. ..$ LinkerPrimerSequence: chr [1:6(1d)] "CATGCTGCCTCCCGTAGGAGT" "CATGCTGCCTCCCGTAGGAGT" "CATGCTGCCTCCCGTAGGAGT" "CATGCTGCCTCCCGTAGGAGT" ... > sessionInfo() R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rhdf5_2.10.0 BiocInstaller_1.16.0 loaded via a namespace (and not attached): [1] tools_3.1.0 zlibbioc_1.12.0
# Terminal
./hdf5-1.8.7-mac-intel-x86_64-static/bin/h5dump ./rich_sparse_otu_table_hdf5.biom HDF5 "./rich_sparse_otu_table_hdf5.biom" { GROUP "/" { ATTRIBUTE "creation-date" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR DATA { (0): "2014-07-29T16:16:36.617320" } } ATTRIBUTE "format-url" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR DATA { (0): "http://biom-format.org" } } ATTRIBUTE "format-version" { DATATYPE H5T_STD_I64LE DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): 2, 1 } } ATTRIBUTE "generated-by" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR DATA { (0): "example" } } ATTRIBUTE "id" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR DATA { (0): "No Table ID" } } ATTRIBUTE "nnz" { DATATYPE H5T_STD_I64LE DATASPACE SCALAR DATA { (0): 15 } } ATTRIBUTE "shape" { DATATYPE H5T_STD_I64LE DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): 5, 6 } } ATTRIBUTE "type" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR DATA { (0): "otu table" } } .....