Question: Tximport error (probably annotation lowercase incompatibility)
gravatar for francoagp
6 weeks ago by
francoagp0 wrote:

Im trying to run tximport for the DESeq2 pipeline for an Arabidopsis DE analysis and i found this error, first i try with ignoretxversion=TRUE,ignoreafterbar=TRUE and none of those flags works, then i read the tx2gene object to compare it with quants.sf file and i see this:

> head(tx2gene,10)
        TXNAME    GENEID
1  AT1G01010.1 AT1G01010
2  AT1G03987.1 AT1G03987
3  AT1G01040.1 AT1G01040
4  AT1G01040.2 AT1G01040
5    at1g01046 AT1G01046
6  AT1G03997.1 AT1G03997
7  AT1G01110.2 AT1G01110
8  AT1G01110.1 AT1G01110
9  AT1G01160.1 AT1G01160
10 AT1G01160.2 AT1G01160

I have done just a few DE analysis, this is my first study in plants, and i havent seen this never (lowercase for some txnames), can be this making the error happen? why the gtf and gff files have this format for some txs? how i solve that, just converting the txsnames to uppercase?

I let the output and session info down Thanks for anyresponse

> library("tximport")
> library("readr")
> dir <- "/path/to/quants"
> samples <- read.csv(file.path(dir,"sampleTable.csv"), header=TRUE)
> rownames(samples) <- samples$dirName
> files <- file.path(dir, samples$dirName, "quant.sf")
> names(files) <- samples$dirName
> library("GenomicFeatures")
> gtffile <- file.path(dir,"Arabidopsis_thaliana.TAIR10.42.gtf")
> txdb <- makeTxDbFromGFF(gtffile, format = "gtf", circ_seqs = character())
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .get_cds_IDX(type, phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
> k <- keys(txdb, keytype = "TXNAME")
> tx2gene <- select(txdb, k, "GENEID", "TXNAME")
'select()' returned 1:1 mapping between keys and columns
> txi <- tximport(files, type="salmon", tx2gene=tx2gene, ignoreTxVersion = TRUE, ignoreAfterBar = TRUE)
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 
Error in summarizeToGene(txi, tx2gene, varReduce, ignoreTxVersion, ignoreAfterBar,  : 
  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.
Example IDs (file): [AT1G19090, AT1G18320, AT5G11100, ...]
Example IDs (tx2gene): [AT1G01010.1, AT1G03987.1, AT1G01040.1, ...]
This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'


> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.6 (Maipo)
Matrix products: default
BLAS/LAPACK: /nas/longleaf/apps/r/3.5.2/lib/
 [1] LC_CTYPE=es_AR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_AR.UTF-8        LC_COLLATE=es_AR.UTF-8    
 [7] LC_PAPER=es_AR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     
other attached packages:
 [1] readr_1.3.1            tximport_1.10.1        GenomicFeatures_1.34.7
 [4] AnnotationDbi_1.44.0   Biobase_2.42.0         GenomicRanges_1.34.0  
 [7] GenomeInfoDb_1.18.2    IRanges_2.16.0         S4Vectors_0.20.1      
[10] BiocGenerics_0.28.0   
loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0                  pillar_1.3.1               
 [3] compiler_3.5.2              XVector_0.22.0             
 [5] prettyunits_1.0.2           bitops_1.0-6               
 [7] tools_3.5.2                 progress_1.2.0             
 [9] zlibbioc_1.28.0             biomaRt_2.38.0             
[11] digest_0.6.18               bit_1.1-14                 
[13] jsonlite_1.6                tibble_2.0.1               
[15] lattice_0.20-38             RSQLite_2.1.1              
[17] memoise_1.1.0               pkgconfig_2.0.2            
[19] rlang_0.3.1                 Matrix_1.2-15              
[21] DelayedArray_0.8.0          DBI_1.0.0                  
[23] GenomeInfoDbData_1.2.0      rtracklayer_1.42.1         
[25] stringr_1.4.0               httr_1.4.0                 
[27] Biostrings_2.50.2           hms_0.4.2                  
[29] grid_3.5.2                  bit64_0.9-7                
[31] R6_2.4.0                    XML_3.98-1.17              
[33] BiocParallel_1.16.5         blob_1.1.1                 
[35] magrittr_1.5                matrixStats_0.54.0         
[37] Rsamtools_1.34.1            GenomicAlignments_1.18.1   
[39] SummarizedExperiment_1.12.0 assertthat_0.2.0           
[41] stringi_1.3.1               RCurl_1.95-4.11            
[43] crayon_1.3.4   
ADD COMMENTlink modified 6 weeks ago by Michael Love23k • written 6 weeks ago by francoagp0
Answer: Tximport error (probably annotation lowercase incompatibility)
gravatar for Michael Love
6 weeks ago by
Michael Love23k
United States
Michael Love23k wrote:

Here the issue can be easily fixed. ignoreTxVersion is for trimming the version numbers from the quantification files. Here the issue is that you have version numbers in the tx2gene table. All you have to do is chop the version numbers in the table. You can do this with:

chop <- function(x) sub("\\..*","",x)
ADD COMMENTlink written 6 weeks ago by Michael Love23k

Thanks Michael, that works for me!

ADD REPLYlink written 5 weeks ago by francoagp0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 283 users visited in the last hour