Question

tximport using for data generated from other tools

0

Entering edit mode

saumya.kumar • 0

@saumyakumar-13171

Last seen 8.7 years ago

Hello,

I want to try using tximport on another tool that I am evaluating, eXpress. For this I converted the quantification files generated by eXpress programatically into salmon's quant.sf format and then thought to give it a try. I get the following error:

reading in files
1 2 Error: all(txId == raw[[txIdCol]]) is not TRUE

This looks like pointing towards the txID, which I have checked in file 2 and it doesn't appear empty at all. I am using this command:

txi.express <-tximport(Files,type = "salmon", tx2gene = tx2gene)

The tx2gene file consists of Ensembl transcript ids and gene Ids. I have used this file for Salmon results before and it had worked.

Is there a way I can do this? tximport seems to be able to read file 1 without any error and its at file 2 it seems to give this error.

Thanks,

Saumya

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.12.4                            SummarizedExperiment_1.2.3              
 [3] TxDb.Mmusculus.UCSC.mm10.knownGene_3.2.2 GenomicFeatures_1.24.5                  
 [5] AnnotationDbi_1.34.4                     Biobase_2.32.0                          
 [7] GenomicRanges_1.24.3                     GenomeInfoDb_1.8.7                      
 [9] IRanges_2.6.1                            S4Vectors_0.10.3                        
[11] BiocGenerics_0.18.0                      tximport_1.0.3                          

loaded via a namespace (and not attached):
 [1] genefilter_1.54.2       locfit_1.5-9.1          splines_3.3.1           lattice_0.20-35        
 [5] colorspace_1.3-2        htmltools_0.3.6         rtracklayer_1.32.2      base64enc_0.1-3        
 [9] survival_2.41-3         XML_3.98-1.7            foreign_0.8-68          DBI_0.6-1              
[13] BiocParallel_1.6.6      RColorBrewer_1.1-2      plyr_1.8.4              stringr_1.2.0          
[17] zlibbioc_1.18.0         Biostrings_2.40.2       munsell_0.4.3           gtable_0.2.0           
[21] htmlwidgets_0.8         memoise_1.1.0           latticeExtra_0.6-28     knitr_1.15.1           
[25] geneplotter_1.50.0      biomaRt_2.28.0          htmlTable_1.9           Rcpp_0.12.10           
[29] xtable_1.8-2            acepack_1.4.1           scales_0.4.1            backports_1.0.5        
[33] checkmate_1.8.2         annotate_1.50.1         Hmisc_4.0-3             XVector_0.12.1         
[37] Rsamtools_1.24.0        gridExtra_2.2.1         ggplot2_2.2.1           digest_0.6.12          
[41] stringi_1.1.5           grid_3.3.1              tools_3.3.1             bitops_1.0-6           
[45] magrittr_1.5            RCurl_1.95-4.8          lazyeval_0.2.0          RSQLite_1.1-2          
[49] tibble_1.3.0            Formula_1.2-1           cluster_2.0.6           Matrix_1.2-10          
[53] data.table_1.10.4       rpart_4.1-11            GenomicAlignments_1.8.4 nnet_7.3-12

tximport salmon eXpress • 4.0k views

ADD COMMENT • link updated 7.7 years ago by Michael Love 43k • written 8.7 years ago by saumya.kumar • 0

score 1 · Answer 1 · 2017-06-02

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 11 hours ago

United States

tximport needs for all the 'files' to have the transcripts in the same order. The package is designed to import quantifications of multiple samples which were run using the same software against the same transcriptome. You can import individual files separately using tximport, if you want to read in files run with multiple software. We do not support the merging of different quantifications across software though.

ADD COMMENT • link 8.7 years ago Michael Love 43k

0

Entering edit mode

Hi Michael,

I am trying to set up a pipeline for interspecies RNA-seq comparison (human and primates), and I was testing if Salmon, tximport and DESeq2 would work. I ran salmon on my fastq files, using indexes for respective species. However, when I try to load them all, together with a concatenated tx2gene annotation file, into a "txi" object using tximport, I get exact same error message as described above (" Error: all(txId == raw[[txIdCol]]) is not TRUE"). When I load the samples + tx2gene files for each species separately, all works fine. As I know the orthologous correspondence between the genes from different species, I thought of somehow merging these individual "txi" objects together, and load it into DESEq.

However I'm not sure how to proceed : 1) whether it is possible to merge txi files by a set of orthologous genes into one txi object? If yes, is it possible to load it then into DEseq2?

Thanks, Alex

ADD REPLY • link 7.7 years ago akozlenkov • 0

0

Entering edit mode

I'd recommend to do this manually. `tximport()` is assuming that all the `files` have the same set of transcripts.

ADD REPLY • link 7.7 years ago Michael Love 43k

0

Entering edit mode

Thanks!

a couple more questions:

(btw, shall I continue here, or start a new thread?)

1) I see that the "txi" object is a list with 4 elements, 3 of which are matrices (abundance, counts, length). Are all of these needed to load the data into DESeq2? Shall I manually re-create all these elements separately, by merging from individual species-specific "txi" data?

2) I have a table of orthologous geneIDs between the 3 species. So, at some point in the analysis I will need to convert the primate species geneIDs to human geneIDs. Is it better to do that before employing tximport on them (I can generate tx2gene tables with, say, macaqueTxID vs humanGeneID, to do so). Alternatively, I could do that before merging individual txi datasets into one "txi" object.

ADD REPLY • link 7.7 years ago akozlenkov • 0

0

Entering edit mode

1) Safest to just make a list with all 4 elements. 3 merged matrices, and then the logical countsFromAbundance value.

2) You can use tximport to merge straight from primate txp to human genes.

ADD REPLY • link 7.7 years ago Michael Love 43k