tximport: kallisto .h5 files not recognized as such when they have a different name
2
1
Entering edit mode
Lucas Carey ▴ 40
@lucas-carey-3898
Last seen 6.0 years ago

I have a kalliso-created abundance.h5 file but the filename (ES_1_high_28881_CGATGT_abundance.h5) includes information about the sample. When I try to import this file using tximport v1.10.0 I get an error. tximport thinks that the file is a .tsv file. 

However, if I create a symlink named abundance.h5, I can load that perfectly well. 

Is there a way to tell tximport that the file really is a kallisto h5 file? 

-Lucas

 

data <- tximport( 'ES_1_high_28881_CGATGT_abundance.h5' , type = "kallisto" , tx2gene=tx2gene , ignoreAfterBar=TRUE )
Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
1 Error in read_tokens_(data, tokenizer, col_specs, col_names, locale_,  :
  embedded nul in string: '\0\xfbLΦ(\x98Hr1'
In addition: Warning message:
Unnamed `col_types` should have the same length as `col_names`. Using smaller of the two.

> system('ln -s ES_1_high_28881_CGATGT_abundance.h5 abundance.h5')

> data <- tximport( 'abundance.h5' , type = "kallisto", tx2gene=tx2gene , ignoreAfterBar=TRUE )
1
summarizing abundance
summarizing counts
summarizing length
summarizing inferential replicates
tximport kallisto • 1.8k views
ADD COMMENT
1
Entering edit mode
Lucas Carey ▴ 40
@lucas-carey-3898
Last seen 6.0 years ago

Always something about posting to a forum that makes me find an answer. 

This works, but is messy: 

1. load the function read_kallisto_h5() from tximport helper.R

2. set importer=read_kallisto_h5

data <- tximport(  'ES_1_high_28881_CGATGT_abundance.h5' , type = "kallisto", txOut = FALSE , tx2gene=tx2gene , ignoreAfterBar=TRUE , importer = read_kallisto_h5 )

 

ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 16 hours ago
WEHI, Melbourne, Australia

kallisto creates a directory for each RNA sample. I would suggest that you don't rename any of the kallisto output files in the directory, because downstream functions like tximport, edgeR::catchKallisto and Sleuth assume the files have standard names. Instead, just rename the directory itself to have an informative name.

ADD COMMENT
0
Entering edit mode

Yes. Also the tximeta package looks within the directories for specific files with useful metadata about the experiment and the transcripts (particularly in the case of Salmon output). Best to leave the directories unchanged as Gordon says, and only rename the directory.

ADD REPLY

Login before adding your answer.

Traffic: 534 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6