Tximport error (probably annotation lowercase incompatibility)
Entering edit mode
francoagp • 0
Last seen 2.4 years ago

Im trying to run tximport for the DESeq2 pipeline for an Arabidopsis DE analysis and i found this error, first i try with ignoretxversion=TRUE,ignoreafterbar=TRUE and none of those flags works, then i read the tx2gene object to compare it with quants.sf file and i see this:

> head(tx2gene,10)
        TXNAME    GENEID
1  AT1G01010.1 AT1G01010
2  AT1G03987.1 AT1G03987
3  AT1G01040.1 AT1G01040
4  AT1G01040.2 AT1G01040
5    at1g01046 AT1G01046
6  AT1G03997.1 AT1G03997
7  AT1G01110.2 AT1G01110
8  AT1G01110.1 AT1G01110
9  AT1G01160.1 AT1G01160
10 AT1G01160.2 AT1G01160

I have done just a few DE analysis, this is my first study in plants, and i havent seen this never (lowercase for some txnames), can be this making the error happen? why the gtf and gff files have this format for some txs? how i solve that, just converting the txsnames to uppercase?

I let the output and session info down Thanks for anyresponse

> library("tximport")
> library("readr")
> dir <- "/path/to/quants"
> samples <- read.csv(file.path(dir,"sampleTable.csv"), header=TRUE)
> rownames(samples) <- samples$dirName
> files <- file.path(dir, samples$dirName, "quant.sf")
> names(files) <- samples$dirName
> library("GenomicFeatures")
> gtffile <- file.path(dir,"Arabidopsis_thaliana.TAIR10.42.gtf")
> txdb <- makeTxDbFromGFF(gtffile, format = "gtf", circ_seqs = character())
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .get_cds_IDX(type, phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
> k <- keys(txdb, keytype = "TXNAME")
> tx2gene <- select(txdb, k, "GENEID", "TXNAME")
'select()' returned 1:1 mapping between keys and columns
> txi <- tximport(files, type="salmon", tx2gene=tx2gene, ignoreTxVersion = TRUE, ignoreAfterBar = TRUE)
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 
Error in summarizeToGene(txi, tx2gene, varReduce, ignoreTxVersion, ignoreAfterBar,  : 
  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.
Example IDs (file): [AT1G19090, AT1G18320, AT5G11100, ...]
Example IDs (tx2gene): [AT1G01010.1, AT1G03987.1, AT1G01040.1, ...]
This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'


> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.6 (Maipo)
Matrix products: default
BLAS/LAPACK: /nas/longleaf/apps/r/3.5.2/lib/libopenblas_haswellp-r0.3.5.so
 [1] LC_CTYPE=es_AR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_AR.UTF-8        LC_COLLATE=es_AR.UTF-8    
 [7] LC_PAPER=es_AR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     
other attached packages:
 [1] readr_1.3.1            tximport_1.10.1        GenomicFeatures_1.34.7
 [4] AnnotationDbi_1.44.0   Biobase_2.42.0         GenomicRanges_1.34.0  
 [7] GenomeInfoDb_1.18.2    IRanges_2.16.0         S4Vectors_0.20.1      
[10] BiocGenerics_0.28.0   
loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0                  pillar_1.3.1               
 [3] compiler_3.5.2              XVector_0.22.0             
 [5] prettyunits_1.0.2           bitops_1.0-6               
 [7] tools_3.5.2                 progress_1.2.0             
 [9] zlibbioc_1.28.0             biomaRt_2.38.0             
[11] digest_0.6.18               bit_1.1-14                 
[13] jsonlite_1.6                tibble_2.0.1               
[15] lattice_0.20-38             RSQLite_2.1.1              
[17] memoise_1.1.0               pkgconfig_2.0.2            
[19] rlang_0.3.1                 Matrix_1.2-15              
[21] DelayedArray_0.8.0          DBI_1.0.0                  
[23] GenomeInfoDbData_1.2.0      rtracklayer_1.42.1         
[25] stringr_1.4.0               httr_1.4.0                 
[27] Biostrings_2.50.2           hms_0.4.2                  
[29] grid_3.5.2                  bit64_0.9-7                
[31] R6_2.4.0                    XML_3.98-1.17              
[33] BiocParallel_1.16.5         blob_1.1.1                 
[35] magrittr_1.5                matrixStats_0.54.0         
[37] Rsamtools_1.34.1            GenomicAlignments_1.18.1   
[39] SummarizedExperiment_1.12.0 assertthat_0.2.0           
[41] stringi_1.3.1               RCurl_1.95-4.11            
[43] crayon_1.3.4   
arabidopsis tximport gtf annotation • 457 views
Entering edit mode
Last seen 4 days ago
United States

Here the issue can be easily fixed. ignoreTxVersion is for trimming the version numbers from the quantification files. Here the issue is that you have version numbers in the tx2gene table. All you have to do is chop the version numbers in the table. You can do this with:

chop <- function(x) sub("\\..*","",x)
Entering edit mode

Thanks Michael, that works for me!


Login before adding your answer.

Traffic: 523 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6