Search
Question: unable to get tximport to work on data from kallisto
0
17 months ago by
swamyvinny0 wrote:

Hi,  I'm an undergrad working in a lab and i'm new to bioconductor so please bear with me.

I used kallisto to get transcript level abundances for my data, and am now trying to use tximport to convert it to gene level. i followed the instructions of the vignette, but I get this error message

Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) :

None of the transcripts in the quantification files are present
in the first column of tx2gene. Check to see that you are using
the same annotation for both.

Here is what I'm doing.

I used a FASTA file from ensembl, so I used the EnsDb.Hsapiens.v86 package instead of the one in the vignette, which is what other posts on the forum said to do(I couldn't find one for the current ensembl 88 release)

library(EnsDb.Hsapiens.v86)
txdb<- EnsDb.Hsapiens.v86
txdb<- EnsDb.Hsapiens.v86
k <- keys(txdb, keytype = "GENEID")
df <- select(txdb, keys = k, keytype = "GENEID", columns = "TXNAME")
tx2gene <- df[, 2:1]

when I looked at the first couple of entries, the tx2gene data table looked like it was supposed to

head(tx2gene)
TXNAME          GENEID
1 ENST00000373020 ENSG00000000003
2 ENST00000494424 ENSG00000000003
3 ENST00000496771 ENSG00000000003
4 ENST00000612152 ENSG00000000003
5 ENST00000614008 ENSG00000000003
6 ENST00000373031 ENSG00000000005

when i go to use the tximport function I get the error message.

 library(tximport)
txi<- tximport("abundance.tsv", type = "kallisto", tx2gene=tx2gene)
Note: importing abundance.h5 is typically faster than abundance.tsv
1
Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) :

None of the transcripts in the quantification files are present
in the first column of tx2gene. Check to see that you are using
the same annotation for both.

I double checked to make sure my file is in the working directory. I'm not to familiar with Bioconductor or R yet so I'm a little stumped. Any help would be appreciated.

Here is my sessioninfo

R version 3.4.0 Patched (2017-05-08 r72665)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8.1 x64 (build 9600)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils
[7] datasets  methods   base

other attached packages:
[1] tximport_1.4.0           EnsDb.Hsapiens.v86_2.1.0
[3] ensembldb_2.0.1          AnnotationFilter_1.0.0
[5] GenomicFeatures_1.28.0   AnnotationDbi_1.38.0
[7] Biobase_2.36.2           GenomicRanges_1.28.1
[9] GenomeInfoDb_1.12.0      IRanges_2.10.0
[11] S4Vectors_0.14.0         BiocGenerics_0.22.0
[13] BiocInstaller_1.26.0

loaded via a namespace (and not attached):
[1] Rcpp_0.12.10                  compiler_3.4.0
[3] XVector_0.16.0                AnnotationHub_2.8.1
[5] ProtGenerics_1.8.0            bitops_1.0-6
[7] tools_3.4.0                   zlibbioc_1.22.0
[9] biomaRt_2.32.0                digest_0.6.12
[11] rhdf5_2.20.0                  tibble_1.3.0
[13] RSQLite_1.1-2                 memoise_1.1.0
[15] lattice_0.20-35               Matrix_1.2-10
[17] shiny_1.0.3                   DelayedArray_0.2.2
[19] DBI_0.6-1                     yaml_2.1.14
[21] GenomeInfoDbData_0.99.0       rtracklayer_1.36.0
[23] httr_1.2.1                    hms_0.3
[25] Biostrings_2.44.0             grid_3.4.0
[27] R6_2.2.1                      XML_3.98-1.7
[31] htmltools_0.3.6               Rsamtools_1.28.0
[33] matrixStats_0.52.2            GenomicAlignments_1.12.0
[35] SummarizedExperiment_1.6.1    xtable_1.8-2
[37] mime_0.5                      interactiveDisplayBase_1.14.0
[39] httpuv_1.3.3                  RCurl_1.95-4.8
[41] lazyeval_0.2.0   
modified 17 months ago by Steve Lianoglou12k • written 17 months ago by swamyvinny0
1
17 months ago by
Denali
Steve Lianoglou12k wrote:

Look at the sequence file you used to generate your kallisto index. Are the IDs (names) of the transcript sequences in there the same as the names you extracted from the EnsDB?

A common mistake is the use of "versioned" ensmble transcript identifiers in one source or the other, ie. one has ENST00001234.1 vs ENST00001234. If that's the case, though, the tximport function provides the ignoreTxVersion to handle that.

You might also want to consider building a salmon/kallisto index directly from the sequences that correspond to the version of the EnsDb you are using. I have some code I can dig up that (I think) does exactly that, if you like.

ah that did the trick. Thank you so much

On a side note, how do I run multiple files at once instead of one at a time?

What do you mean "run multiple files at a time?" Sorry if I'm being obtuse but I have no idea what you might be referring to. Can you provide more context to the question?

Yes, note that the first argument to tximport is files, with an "s" on the end. files is a character vector of the paths to the multiple files (although of course it can be of length 1 if you only had a single experiment). Please see the examples of tximport in the man pages and vignette, where multiple files are specified.

If you're using kallisto to quantify your samples I'd suggest using sleuth instead of tximport -> another DE package, as sleuth is more accurate. See https://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.4324.html.

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.