Search
Question: Problem with readFast5Summary in IONiseR related to HDF5
0
19 months ago by
nathaniellegall0 wrote:

I'm having a bit of trouble running the 'readFAST5Summary' function in IONiseR. The error message mentions HDF5 so I have updated these in the command line using home-brew;

brew install hdf5

pip3 install h5py

and tried to run the code again but it gives me this error message.

> fast5files <- list.files(path = "/Volumes/NGS Lab/MinION/data/downloads/pass/batch_1487083485858", pattern = ".fast5$", full.names = TRUE) > example.summary <- readFast5Summary( fast5files ) Checking file validity Reading Channel Data Reading Raw Data Reading Template Data Error in H5Aopen(did, "duration") : HDF5. Attribute. Unable to initialize object. I think that I have installed all of the additional packages. > sessionInfo() R version 3.3.3 (2017-03-06) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: macOS Sierra 10.12 locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] BiocStyle_2.2.1 knitr_1.15.1 rmarkdown_1.4 testthat_1.0.2 gridExtra_2.2.1 ggplot2_2.2.1 IONiseR_1.4.4 loaded via a namespace (and not attached): [1] Rcpp_0.12.10 RColorBrewer_1.1-2 GenomeInfoDb_1.10.3 plyr_1.8.4 [5] XVector_0.14.1 bitops_1.0-6 tools_3.3.3 zlibbioc_1.20.0 [9] digest_0.6.12 evaluate_0.10 tibble_1.2 gtable_0.2.0 [13] rhdf5_2.18.0 lattice_0.20-35 Matrix_1.2-8 DBI_0.6 [17] parallel_3.3.3 stringr_1.2.0 hwriter_1.3.2 dplyr_0.5.0 [21] Biostrings_2.42.1 S4Vectors_0.12.2 IRanges_2.8.2 rprojroot_1.2 [25] stats4_3.3.3 grid_3.3.3 data.table_1.10.4 Biobase_2.34.0 [29] R6_2.2.0 BiocParallel_1.8.1 latticeExtra_0.6-28 tidyr_0.6.1 [33] magrittr_1.5 backports_1.0.5 htmltools_0.3.5 scales_0.4.1 [37] Rsamtools_1.26.1 GenomicAlignments_1.10.1 BiocGenerics_0.20.0 GenomicRanges_1.26.4 [41] ShortRead_1.32.1 assertthat_0.1 SummarizedExperiment_1.4.0 colorspace_1.3-2 [45] stringi_1.1.3 RCurl_1.95-4.8 lazyeval_0.2.0 munsell_0.4.3 [49] crayon_1.3.2 Any help to resolve this would be appreciated. ADD COMMENTlink modified 13 months ago by Lescai, Francesco380 • written 19 months ago by nathaniellegall0 I notice you have a space in the path to your data, which is generally not a good idea. It might be unrelated to this issue (but I would suggest you to test it anyway). ADD REPLYlink written 19 months ago by WouterDeCoster110 0 19 months ago by Mike Smith3.0k EMBL Heidelberg / de.NBI Mike Smith3.0k wrote: That's an unusual error, as it means an attribute I always expect to be present in the fast5 files (essentially how long the sequencing of that read took) can't be found. Unfortunately I can't tell from the message if that's something that affects all your files, or just one. To begin with, I would suggest running the code on a single file and seeing if you get the error e.g. example.summary <- readFast5Summary( fast5files[1] ) If that works fine you might have to go into a slightly painful process of trying to identify the offending file by using subsets of the list of files until you get the error. If you manage to find an example file that throws the error, please send it to me (email, Google Drive, ftp, etc) and I'll see if I can determine whether this is something odd in that file, or if this a bug in IONiseR that needs patching. ADD COMMENTlink written 19 months ago by Mike Smith3.0k 0 19 months ago by nathaniellegall0 wrote: I restarted the system and the readFast5Summary appears to work fine now on test data for another package. Maybe the HDF5 update needed the system to restart before it kicked in. > fast5files <- list.files(path = "/Users/NLsMacBook/Documents/poretools-pfaucon/test_data", pattern = ".fast5$", full.names = TRUE)

> example.summary <- readFast5Summary( fast5files )
Checking file validity
Done

All of the other functions included in the tutorial (https://www.bioconductor.org/packages/devel/bioc/vignettes/IONiseR/inst/doc/IONiseR.html) work as well so I am satisfied that the code works. But when it comes to running this on my own base called files R returns with another error.

> lambda <- list.files(path="/Volumes/NL 16GB/reads", pattern = ".fast5$", full.names = TRUE) > lambda.summary <- readFast5Summary(lambda) Checking file validity Error in which(fileStatus) : argument to 'which' is not logical I think that someone else had a similar issue and it was something to do with the file structure so I will continue this issue as part of their thread. If I can't find it then I will start another thread. ADD COMMENTlink written 19 months ago by nathaniellegall0 0 13 months ago by Denmark Lescai, Francesco380 wrote: Hi there, I have now a similar problem related to HDF5 reading. We sequenced with Nanopore last week and although I tried to update all packages, I get this error message  > library(IONiseR) > fast5files <- list.files(path = "/data/nanopore/nano_20170831/fast5/pass/0/", pattern = ".fast5$", full.names = TRUE)
> summary <- readFast5Summary(fast5files)
Checking file validity
Error in H5Gopen(fid, "/Analyses/EventDetection_000/Reads") :
HDF5. Symbol table. Can't open object.


I've successfully loaded the metadata with the package poRe, so would be tempted to say the data seem ok.
How would you suggest to further track down / solve the source of error?

here's my session info


sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.9 (Final)

Matrix products: default
BLAS/LAPACK: /scratch/.com/extra/OpenBLAS/20150505/lib/libopenblas_sandybridgep-r0.2.14.so

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
[9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] bindrcpp_0.2         IONiseR_2.0.0        BiocInstaller_1.26.1



1

Thanks for reporting the error.  If you could make one of your Fast5 files available to me then i'll try and figure out what has changed in the file format and patch IONiseR accordingly.  You can find my email address in the package DESCRIPTION file, or put a link to Dropbox, FTP etc here.

1

Hi there,

Any progress on that issue? It seems that I have the same issue, but unfortunately I am not able to provide a fast5 file at the moment.

Best,
Frank

0
13 months ago by
Denmark
Lescai, Francesco380 wrote:

I sent one of my fast5 to Mike via email. I suppose they're working on it. Looking fwd to hear something too.

1

It looks like EventDetection data is not available in the file you provided.  IONiseR assumed this would always be present, and didn't have any checks built in to make sure this was true.  I've patched it so the example file you can be read.  This is available in IONiseR version 2.1.1.  That should be in built by Bioconductor in the next few days, or you can install it directly with BiocInstaller::biocLite("grimbough/IONiseR").

There might be some knock on effects with other functions that expect even data to be present, so if you experience any other problems please let me know and I'll fix them up.