Search
Question: unable to get tximport to work on data from kallisto
0
gravatar for swamyvinny
6 months ago by
swamyvinny0
swamyvinny0 wrote:

Hi,  I'm an undergrad working in a lab and i'm new to bioconductor so please bear with me.

I used kallisto to get transcript level abundances for my data, and am now trying to use tximport to convert it to gene level. i followed the instructions of the vignette, but I get this error message

Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) : 
  
  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

 

Here is what I'm doing.

I used a FASTA file from ensembl, so I used the EnsDb.Hsapiens.v86 package instead of the one in the vignette, which is what other posts on the forum said to do(I couldn't find one for the current ensembl 88 release)

library(EnsDb.Hsapiens.v86)
txdb<- EnsDb.Hsapiens.v86
txdb<- EnsDb.Hsapiens.v86
k <- keys(txdb, keytype = "GENEID")
df <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME")
tx2gene <- df[, 2:1]

when I looked at the first couple of entries, the tx2gene data table looked like it was supposed to 

head(tx2gene)
           TXNAME          GENEID
1 ENST00000373020 ENSG00000000003
2 ENST00000494424 ENSG00000000003
3 ENST00000496771 ENSG00000000003
4 ENST00000612152 ENSG00000000003
5 ENST00000614008 ENSG00000000003
6 ENST00000373031 ENSG00000000005

when i go to use the tximport function I get the error message.

 

 library(tximport)
 txi<- tximport("abundance.tsv", type = "kallisto", tx2gene=tx2gene)
Note: importing `abundance.h5` is typically faster than `abundance.tsv`
reading in files with read_tsv
1
Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) : 

  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.

 

I double checked to make sure my file is in the working directory. I'm not to familiar with Bioconductor or R yet so I'm a little stumped. Any help would be appreciated.

 

Here is my sessioninfo

 

R version 3.4.0 Patched (2017-05-08 r72665)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8.1 x64 (build 9600)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
 [1] tximport_1.4.0           EnsDb.Hsapiens.v86_2.1.0
 [3] ensembldb_2.0.1          AnnotationFilter_1.0.0  
 [5] GenomicFeatures_1.28.0   AnnotationDbi_1.38.0    
 [7] Biobase_2.36.2           GenomicRanges_1.28.1    
 [9] GenomeInfoDb_1.12.0      IRanges_2.10.0          
[11] S4Vectors_0.14.0         BiocGenerics_0.22.0     
[13] BiocInstaller_1.26.0    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10                  compiler_3.4.0               
 [3] XVector_0.16.0                AnnotationHub_2.8.1          
 [5] ProtGenerics_1.8.0            bitops_1.0-6                 
 [7] tools_3.4.0                   zlibbioc_1.22.0              
 [9] biomaRt_2.32.0                digest_0.6.12                
[11] rhdf5_2.20.0                  tibble_1.3.0                 
[13] RSQLite_1.1-2                 memoise_1.1.0                
[15] lattice_0.20-35               Matrix_1.2-10                
[17] shiny_1.0.3                   DelayedArray_0.2.2           
[19] DBI_0.6-1                     yaml_2.1.14                  
[21] GenomeInfoDbData_0.99.0       rtracklayer_1.36.0           
[23] httr_1.2.1                    hms_0.3                      
[25] Biostrings_2.44.0             grid_3.4.0                   
[27] R6_2.2.1                      XML_3.98-1.7                 
[29] BiocParallel_1.10.1           readr_1.1.0                  
[31] htmltools_0.3.6               Rsamtools_1.28.0             
[33] matrixStats_0.52.2            GenomicAlignments_1.12.0     
[35] SummarizedExperiment_1.6.1    xtable_1.8-2                 
[37] mime_0.5                      interactiveDisplayBase_1.14.0
[39] httpuv_1.3.3                  RCurl_1.95-4.8               
[41] lazyeval_0.2.0   
ADD COMMENTlink modified 6 months ago by Steve Lianoglou12k • written 6 months ago by swamyvinny0
1
gravatar for Steve Lianoglou
6 months ago by
Genentech
Steve Lianoglou12k wrote:

Look at the sequence file you used to generate your kallisto index. Are the IDs (names) of the transcript sequences in there the same as the names you extracted from the EnsDB?

A common mistake is the use of "versioned" ensmble transcript identifiers in one source or the other, ie. one has ENST00001234.1 vs ENST00001234. If that's the case, though, the tximport function provides the ignoreTxVersion to handle that.

You might also want to consider building a salmon/kallisto index directly from the sequences that correspond to the version of the EnsDb you are using. I have some code I can dig up that (I think) does exactly that, if you like.

ADD COMMENTlink modified 6 months ago • written 6 months ago by Steve Lianoglou12k

ah that did the trick. Thank you so much

On a side note, how do I run multiple files at once instead of one at a time?

ADD REPLYlink modified 6 months ago • written 6 months ago by swamyvinny0

What do you mean "run multiple files at a time?" Sorry if I'm being obtuse but I have no idea what you might be referring to. Can you provide more context to the question?

ADD REPLYlink written 6 months ago by Steve Lianoglou12k

Yes, note that the first argument to tximport is files, with an "s" on the end. files is a character vector of the paths to the multiple files (although of course it can be of length 1 if you only had a single experiment). Please see the examples of tximport in the man pages and vignette, where multiple files are specified.

ADD REPLYlink written 6 months ago by Michael Love14k

If you're using kallisto to quantify your samples I'd suggest using sleuth instead of tximport -> another DE package, as sleuth is more accurate. See https://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.4324.html.

ADD REPLYlink written 4 months ago by lakigigar0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 196 users visited in the last hour