Question: tximport error with kallisto .h5 files
0
gravatar for steve.standage
4 months ago by
USA / Cincinnati / Cincinnati Children's Hospital Medical Center
steve.standage0 wrote:

This is my first time analyzing RNA sequencing data for gene expression. I am trying to import count data from Kallisto to DESeq2 using the tximport package following the instructions here. After running this code:

filenames <- list.files("./Data", full.names = TRUE, pattern = "*abundance.h5")
files <- filenames %>% `names<-`(str_extract(filenames, "SWS[:digit:]*"))

txi.kallisto <- tximport(files, type = "kallisto", txOut = TRUE)

I get the following error:

Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
1 Warning: 4894 parsing failures.
row         col  expected        actual                        file
  2 <U+0089>HDF           embedded null './Data/SWS01_abundance.h5'
  2 NA          1 columns 2 columns     './Data/SWS01_abundance.h5'
  5 <U+0089>HDF           embedded null './Data/SWS01_abundance.h5'
  9 <U+0089>HDF           embedded null './Data/SWS01_abundance.h5'
 10 <U+0089>HDF           embedded null './Data/SWS01_abundance.h5'
... ........... ......... ............. ...........................
See problems(...) for more details.

Error in tximport(files, type = "kallisto", tx2gene = tx2gene, txOut = TRUE) : 
  all(c(lengthCol, abundanceCol) %in% names(raw)) is not TRUE
In addition: Warning message:
Unnamed `col_types` should have the same length as `col_names`. Using smaller of the two.

I'm trying to import the .h5 files, but when I peak in the .tsv files, they are formatted like this:

# A tibble: 105,129 x 5
   target_id                    length eff_length est_counts   tpm
   <chr>                         <dbl>      <dbl>      <dbl> <dbl>
 1 ENSMUST00000177564.1-Trdd2       16         17          0     0
 2 ENSMUST00000196221.1-Trdd1        9         10          0     0
 3 ENSMUST00000179664.1-Trdd1       11         12          0     0
 4 ENSMUST00000178537.1-Trbd1       12         13          0     0
 5 ENSMUST00000178862.1-Trbd2       14         15          0     0
 6 ENSMUST00000179520.1-Ighd4-1     11         12          0     0
 7 ENSMUST00000179883.1-Ighd3-2     16         17          0     0
 8 ENSMUST00000195858.1-Ighd5-6     10         11          0     0
 9 ENSMUST00000179932.1-Ighd5-6     12         13          0     0
10 ENSMUST00000180001.1-Ighd2-8     17         18          0     0
# ... with 105,119 more rows

Here's my session info:

R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.22.2               SummarizedExperiment_1.12.0 DelayedArray_0.8.0          BiocParallel_1.16.5         matrixStats_0.54.0          tximport_1.10.1            
 [7] rhdf5_2.26.2                GenomicFeatures_1.34.1      AnnotationDbi_1.44.0        Biobase_2.42.0              GenomicRanges_1.34.0        GenomeInfoDb_1.18.1        
[13] IRanges_2.16.0              S4Vectors_0.20.1            BiocGenerics_0.28.0         forcats_0.3.0               stringr_1.3.1               dplyr_0.7.8                
[19] purrr_0.2.5                 readr_1.3.1                 tidyr_0.8.2                 tibble_1.4.2                ggplot2_3.1.0               tidyverse_1.2.1            

loaded via a namespace (and not attached):
 [1] colorspace_1.3-2         htmlTable_1.13.1         XVector_0.22.0           base64enc_0.1-3          rstudioapi_0.9.0         bit64_0.9-7              fansi_0.4.0             
 [8] lubridate_1.7.4          xml2_1.2.0               splines_3.5.2            geneplotter_1.60.0       knitr_1.21               Formula_1.2-3            jsonlite_1.6            
[15] Rsamtools_1.34.0         broom_0.5.1              annotate_1.60.0          cluster_2.0.7-1          compiler_3.5.2           httr_1.4.0               backports_1.1.3         
[22] assertthat_0.2.0         Matrix_1.2-15            lazyeval_0.2.1           cli_1.0.1                acepack_1.4.1            htmltools_0.3.6          prettyunits_1.0.2       
[29] tools_3.5.2              bindrcpp_0.2.2           gtable_0.2.0             glue_1.3.0               GenomeInfoDbData_1.2.0   Rcpp_1.0.0               cellranger_1.1.0        
[36] Biostrings_2.50.2        nlme_3.1-137             rtracklayer_1.42.1       xfun_0.4                 rvest_0.3.2              XML_3.98-1.16            zlibbioc_1.28.0         
[43] scales_1.0.0             hms_0.4.2                RColorBrewer_1.1-2       yaml_2.2.0               memoise_1.1.0            gridExtra_2.3            biomaRt_2.38.0          
[50] rpart_4.1-13             latticeExtra_0.6-28      stringi_1.2.4            RSQLite_2.1.1            genefilter_1.64.0        checkmate_1.8.5          rlang_0.3.1             
[57] pkgconfig_2.0.2          bitops_1.0-6             lattice_0.20-38          Rhdf5lib_1.4.2           bindr_0.1.1              GenomicAlignments_1.18.1 htmlwidgets_1.3         
[64] bit_1.1-14               tidyselect_0.2.5         plyr_1.8.4               magrittr_1.5             R6_2.3.0                 generics_0.0.2           Hmisc_4.1-1             
[71] DBI_1.0.0                pillar_1.3.1             haven_2.0.0              foreign_0.8-71           withr_2.1.2              survival_2.43-3          RCurl_1.95-4.11         
[78] nnet_7.3-12              modelr_0.1.2             crayon_1.3.4             utf8_1.1.4               progress_1.2.0           locfit_1.5-9.1           grid_3.5.2              
[85] readxl_1.2.0             data.table_1.11.8        blob_1.1.1               digest_0.6.18            xtable_1.8-3             munsell_0.5.0

Any ideas to help solve my import problem?

Thanks for your help!

deseq2 tximport kallisto • 192 views
ADD COMMENTlink modified 4 months ago by Michael Love23k • written 4 months ago by steve.standage0

When I run the commands using the abundance.tsv files:

txdb <- makeTxDbFromGFF("gencode.vM20.annotation.gff3.gz") # Pulled this file from: https://www.gencodegenes.org/mouse/release_M20.html
k <- keys(txdb, keytype = "TXNAME")
tx2gene <- select(txdb, k, "GENEID", "TXNAME")
txi.kallisto.tsv <- tximport(files, type = "kallisto", tx2gene = tx2gene, ignoreAfterBar = TRUE)

It actually imports all my files, but renders the subsequent error:

Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 
Error in summarizeToGene(txi, tx2gene, varReduce, ignoreTxVersion, ignoreAfterBar,  : 

  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

Example IDs (file): [ENSMUST00000177564.1-Trdd2, ENSMUST00000196221.1-Trdd1, ENSMUST00000179664.1-Trdd1, ...]

Example IDs (tx2gene): [ENSMUST00000193812.1, ENSMUST00000082908.1, ENSMUST00000192857.1, ...]

  This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'.

Thanks again!

ADD REPLYlink written 4 months ago by steve.standage0

Consider taking the advice that is printed in the error message.

ADD REPLYlink written 4 months ago by Michael Love23k
Answer: tximport error with kallisto .h5 files
0
gravatar for Michael Love
4 months ago by
Michael Love23k
United States
Michael Love23k wrote:

tximport as is currently implemented assumes you don't modify the names of the output files of the methods. This is generally a good idea I think not to modify the filenames themselves, so I probably won't change this. There was a very recent post here showing some code to get around it, by specifying your own importer.

ADD COMMENTlink written 4 months ago by Michael Love23k

Thank you for your help!

ADD REPLYlink written 3 months ago by steve.standage0

Thank you for your help!

ADD REPLYlink written 3 months ago by steve.standage0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 310 users visited in the last hour