Question

unable to get tximport to work on data from kallisto

1

Entering edit mode

swamyvinny ▴ 10

@swamyvinny-13018

Last seen 6.6 years ago

Hi, I'm an undergrad working in a lab and i'm new to bioconductor so please bear with me.

I used kallisto to get transcript level abundances for my data, and am now trying to use tximport to convert it to gene level. i followed the instructions of the vignette, but I get this error message

Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) : 
  
  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

Here is what I'm doing.

I used a FASTA file from ensembl, so I used the EnsDb.Hsapiens.v86 package instead of the one in the vignette, which is what other posts on the forum said to do(I couldn't find one for the current ensembl 88 release)

library(EnsDb.Hsapiens.v86)
txdb<- EnsDb.Hsapiens.v86
txdb<- EnsDb.Hsapiens.v86
k <- keys(txdb, keytype = "GENEID")
df <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME")
tx2gene <- df[, 2:1]

when I looked at the first couple of entries, the tx2gene data table looked like it was supposed to

head(tx2gene)
           TXNAME          GENEID
1 ENST00000373020 ENSG00000000003
2 ENST00000494424 ENSG00000000003
3 ENST00000496771 ENSG00000000003
4 ENST00000612152 ENSG00000000003
5 ENST00000614008 ENSG00000000003
6 ENST00000373031 ENSG00000000005

when i go to use the tximport function I get the error message.

 library(tximport)
 txi<- tximport("abundance.tsv", type = "kallisto", tx2gene=tx2gene)
Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
1
Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) : 

  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

I double checked to make sure my file is in the working directory. I'm not to familiar with Bioconductor or R yet so I'm a little stumped. Any help would be appreciated.

Here is my sessioninfo

R version 3.4.0 Patched (2017-05-08 r72665)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8.1 x64 (build 9600)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
 [1] tximport_1.4.0           EnsDb.Hsapiens.v86_2.1.0
 [3] ensembldb_2.0.1          AnnotationFilter_1.0.0  
 [5] GenomicFeatures_1.28.0   AnnotationDbi_1.38.0    
 [7] Biobase_2.36.2           GenomicRanges_1.28.1    
 [9] GenomeInfoDb_1.12.0      IRanges_2.10.0          
[11] S4Vectors_0.14.0         BiocGenerics_0.22.0     
[13] BiocInstaller_1.26.0    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10                  compiler_3.4.0               
 [3] XVector_0.16.0                AnnotationHub_2.8.1          
 [5] ProtGenerics_1.8.0            bitops_1.0-6                 
 [7] tools_3.4.0                   zlibbioc_1.22.0              
 [9] biomaRt_2.32.0                digest_0.6.12                
[11] rhdf5_2.20.0                  tibble_1.3.0                 
[13] RSQLite_1.1-2                 memoise_1.1.0                
[15] lattice_0.20-35               Matrix_1.2-10                
[17] shiny_1.0.3                   DelayedArray_0.2.2           
[19] DBI_0.6-1                     yaml_2.1.14                  
[21] GenomeInfoDbData_0.99.0       rtracklayer_1.36.0           
[23] httr_1.2.1                    hms_0.3                      
[25] Biostrings_2.44.0             grid_3.4.0                   
[27] R6_2.2.1                      XML_3.98-1.7                 
[29] BiocParallel_1.10.1           readr_1.1.0                  
[31] htmltools_0.3.6               Rsamtools_1.28.0             
[33] matrixStats_0.52.2            GenomicAlignments_1.12.0     
[35] SummarizedExperiment_1.6.1    xtable_1.8-2                 
[37] mime_0.5                      interactiveDisplayBase_1.14.0
[39] httpuv_1.3.3                  RCurl_1.95-4.8               
[41] lazyeval_0.2.0

tximport kallisto • 3.6k views

ADD COMMENT • link updated 6.9 years ago by Steve Lianoglou ★ 13k • written 6.9 years ago by swamyvinny ▴ 10

score 1 · Answer 1 · 2017-05-11

1

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 13 months ago

United States

Look at the sequence file you used to generate your kallisto index. Are the IDs (names) of the transcript sequences in there the same as the names you extracted from the EnsDB?

A common mistake is the use of "versioned" ensmble transcript identifiers in one source or the other, ie. one has ENST00001234.1 vs ENST00001234. If that's the case, though, the tximport function provides the ignoreTxVersion to handle that.

You might also want to consider building a salmon/kallisto index directly from the sequences that correspond to the version of the EnsDb you are using. I have some code I can dig up that (I think) does exactly that, if you like.

ADD COMMENT • link 6.9 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

ah that did the trick. Thank you so much

On a side note, how do I run multiple files at once instead of one at a time?

ADD REPLY • link 6.9 years ago swamyvinny ▴ 10

0

Entering edit mode

What do you mean "run multiple files at a time?" Sorry if I'm being obtuse but I have no idea what you might be referring to. Can you provide more context to the question?

ADD REPLY • link 6.9 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Yes, note that the first argument to tximport is files, with an "s" on the end. files is a character vector of the paths to the multiple files (although of course it can be of length 1 if you only had a single experiment). Please see the examples of tximport in the man pages and vignette, where multiple files are specified.

ADD REPLY • link 6.9 years ago Michael Love 41k

0

Entering edit mode

If you're using kallisto to quantify your samples I'd suggest using sleuth instead of tximport -> another DE package, as sleuth is more accurate. See https://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.4324.html.

ADD REPLY • link 6.8 years ago lakigigar • 0