Importing Kallisto counts with tximport
1
0
Entering edit mode
zroger499 • 0
@zroger499-23414
Last seen 4 months ago
Portugal

Hi all. I´m using Kallisto with DeSeq2 to do a DGE analysis on my samples. I used the following commands:

kallisto index -i cro_kal_index $StringTieDir/merged_transcriptome.fasta //for each sample kallisto quant -i cro_kal_index -o mapping_to_genome/07_kallisto/idio_1$DATA/R1_cut_paired.gz \$DATA/R2_cut_paired.fastq.gz


DeSeq2 manual advises to use tximport to import the data files to downstream analysis However, I get this error:

Error in H5Fopen(file, "H5F_ACC_RDONLY", native = native) :
HDF5. File accessibility. Unable to open file


My first approach was to use the tsv files instead of the h5 files. However, the same error still occurs.

The files were generated in a Linux server and transferred over by scp to Windows. I checked the md5 sums and everything seems fine. The thing that puzzles me the most is an error related to the rhdf5 library occurring with tsv files.

Below is my script with the session info:

library(tximport)
library(rhdf5)

files <- "abundance.h5"
names(files) <- paste0("sample", 1)
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)

#Error in H5Fopen(file, "H5F_ACC_RDONLY", native = native) :
#HDF5. File accessibilty. Unable to open file
#md5 sum in server : a0810ec7b108ac4948d3ad0bbd697d63
#md5 sum in my pc: a0810ec7b108ac4948d3ad0bbd697d63

files <- "abundance.tsv"
names(files) <- paste0("sample", 1)
txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)

#Note: importing abundance.h5 is typically faster than abundance.tsv
#Error in H5Fopen(file, "H5F_ACC_RDONLY", native = native) :
#HDF5. File accessibilty. Unable to open file.

#md5 sum in server : 442bf73d9eab9bfa28d9e1ab13d59b08
#md5 sum in pc :     442bf73d9eab9bfa28d9e1ab13d59b08

# R version 3.6.1 (2019-07-05)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 18362)
#
# Matrix products: default
#
# locale:
#   [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252
# [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
# [5] LC_TIME=English_United Kingdom.1252
#
# attached base packages:
#   [1] stats     graphics  grDevices utils     datasets  methods   base
#
# other attached packages:
#   [1] rhdf5_2.30.1    tximport_1.14.2
#
# loaded via a namespace (and not attached):
#   [1] Rcpp_1.0.4.6        crayon_1.3.4        R6_2.4.1            lifecycle_0.2.0
# [5] magrittr_1.5        pillar_1.4.3        rlang_0.4.5         rstudioapi_0.11
# [9] vctrs_0.2.4         ellipsis_0.3.0      Rhdf5lib_1.8.0      tools_3.6.1
# [13] readr_1.3.1         glue_1.4.0          hms_0.5.3           compiler_3.6.1
# [17] pkgconfig_2.0.3     BiocManager_1.30.10 tibble_3.0.1


Yes, I have seen other users have a similar error, but I could not get a solution. If there is any post with the exact same error I have missed let me know.

Best

tximport Kallisto H5Fopen • 1.6k views
0
Entering edit mode

Cross-posted on Biostars: https://www.biostars.org/p/434285/

0
Entering edit mode
@mikelove
Last seen 3 hours ago
United States

Let's take tximport out of the equation to help debug.

Can you import h5 or tsv into R on their own, e.g. using h5read or read.delim?

0
Entering edit mode

Hey Mike, I think that this is either a Kallisto or rhdf5 issue. Potential fixes here:

0
Entering edit mode

h5read(filename) returns the same error as above. read.delim works fine.

Soo the problem lies with my h5read librabry? And why does this function get called when I try to read the tsv file?

1
Entering edit mode

You can bypass the rhdf5 issue if you set dropInfReps=TRUE when reading in the TSV.

0
Entering edit mode

Thank you this worked!

0
Entering edit mode

If you could try one of the solutions posted,it may help to narrow down the problem. It could be related to how Kallisto saves files under certain system configurations, but it is not clear.

0
Entering edit mode

Thank you for your answer. If I got this right the first solution seems to be a fix to the h5 package. I got the package from BioC just today, shouldn´t the issues be resolved in the version I downloaded? I cannot test the second fix since I can not set up a conda environment. I will bookmark for the future, since I might need to use the h5 files later.

0
Entering edit mode

Yes, the first solution that I posted relates to the rhdf5 package - in order to utilise the bootstrapped counts from Kallisto, you'd need to go this route and not the TSV route. That fix by Mike Smith ('grimbough', on GitHub) came 1 month after the current version of Bioconductor (v3.10) was released; so, it may not yet have actually propagated to the official release branch. You could try to obtain the development version or download the package straight from GitHub to see if that works. I am not fully convinced that this issue relates to rhdf5 though; rather, that it relates to the HDF5 library (outside R) and how Kallisto utilises different versions of this library.