Question

tximport: kallisto .h5 files not recognized as such when they have a different name

1

Entering edit mode

Lucas Carey ▴ 40

@lucas-carey-3898

Last seen 5.3 years ago

I have a kalliso-created abundance.h5 file but the filename (ES_1_high_28881_CGATGT_abundance.h5) includes information about the sample. When I try to import this file using tximport v1.10.0 I get an error. tximport thinks that the file is a .tsv file.

However, if I create a symlink named abundance.h5, I can load that perfectly well.

Is there a way to tell tximport that the file really is a kallisto h5 file?

-Lucas

data <- tximport( 'ES_1_high_28881_CGATGT_abundance.h5' , type = "kallisto" , tx2gene=tx2gene , ignoreAfterBar=TRUE )
Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
1 Error in read_tokens_(data, tokenizer, col_specs, col_names, locale_, :
embedded nul in string: '\0\xfbLΦ(\x98Hr1'
In addition: Warning message:
Unnamed `col_types` should have the same length as `col_names`. Using smaller of the two.

> system('ln -s ES_1_high_28881_CGATGT_abundance.h5 abundance.h5')

> data <- tximport( 'abundance.h5' , type = "kallisto", tx2gene=tx2gene , ignoreAfterBar=TRUE )
1
summarizing abundance
summarizing counts
summarizing length
summarizing inferential replicates

tximport kallisto • 1.5k views

ADD COMMENT • link updated 5.3 years ago by Gordon Smyth 50k • written 5.3 years ago by Lucas Carey ▴ 40

score 1 · Answer 1 · 2018-12-31

Always something about posting to a forum that makes me find an answer.

This works, but is messy:

1. load the function read_kallisto_h5() from tximport helper.R

2. set importer=read_kallisto_h5

data <- tximport(  'ES_1_high_28881_CGATGT_abundance.h5' , type = "kallisto", txOut = FALSE , tx2gene=tx2gene , ignoreAfterBar=TRUE , importer = read_kallisto_h5 )

score 0 · Answer 2 · 2018-12-31

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 6 hours ago

WEHI, Melbourne, Australia

kallisto creates a directory for each RNA sample. I would suggest that you don't rename any of the kallisto output files in the directory, because downstream functions like tximport, edgeR::catchKallisto and Sleuth assume the files have standard names. Instead, just rename the directory itself to have an informative name.

ADD COMMENT • link 5.3 years ago Gordon Smyth 50k

0

Entering edit mode

Yes. Also the tximeta package looks within the directories for specific files with useful metadata about the experiment and the transcripts (particularly in the case of Salmon output). Best to leave the directories unchanged as Gordon says, and only rename the directory.

ADD REPLY • link 5.3 years ago Michael Love 41k