Question

Tximport error (probably annotation lowercase incompatibility)

0

Entering edit mode

francoagp • 0

@francoagp-20414

Last seen 4.9 years ago

Im trying to run tximport for the DESeq2 pipeline for an Arabidopsis DE analysis and i found this error, first i try with ignoretxversion=TRUE,ignoreafterbar=TRUE and none of those flags works, then i read the tx2gene object to compare it with quants.sf file and i see this:

> head(tx2gene,10)
        TXNAME    GENEID
1  AT1G01010.1 AT1G01010
2  AT1G03987.1 AT1G03987
3  AT1G01040.1 AT1G01040
4  AT1G01040.2 AT1G01040
5    at1g01046 AT1G01046
6  AT1G03997.1 AT1G03997
7  AT1G01110.2 AT1G01110
8  AT1G01110.1 AT1G01110
9  AT1G01160.1 AT1G01160
10 AT1G01160.2 AT1G01160

I have done just a few DE analysis, this is my first study in plants, and i havent seen this never (lowercase for some txnames), can be this making the error happen? why the gtf and gff files have this format for some txs? how i solve that, just converting the txsnames to uppercase?

I let the output and session info down Thanks for anyresponse

> library("tximport")
> library("readr")
> dir <- "/path/to/quants"
> samples <- read.csv(file.path(dir,"sampleTable.csv"), header=TRUE)
> rownames(samples) <- samples$dirName
> 
> files <- file.path(dir, samples$dirName, "quant.sf")
> names(files) <- samples$dirName
> library("GenomicFeatures")
> gtffile <- file.path(dir,"Arabidopsis_thaliana.TAIR10.42.gtf")
> txdb <- makeTxDbFromGFF(gtffile, format = "gtf", circ_seqs = character())
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .get_cds_IDX(type, phase) :
  The "phase" metadata column contains non-NA values for features of type
  stop_codon. This information was ignored.
> k <- keys(txdb, keytype = "TXNAME")
> tx2gene <- select(txdb, k, "GENEID", "TXNAME")
'select()' returned 1:1 mapping between keys and columns
> txi <- tximport(files, type="salmon", tx2gene=tx2gene, ignoreTxVersion = TRUE, ignoreAfterBar = TRUE)
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11 
Error in summarizeToGene(txi, tx2gene, varReduce, ignoreTxVersion, ignoreAfterBar,  : 
  None of the transcripts in the quantification files are present
  in the first column of tx2gene. Check to see that you are using
  the same annotation for both.
Example IDs (file): [AT1G19090, AT1G18320, AT5G11100, ...]
Example IDs (tx2gene): [AT1G01010.1, AT1G03987.1, AT1G01040.1, ...]
This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'

SessionInfo:

> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.6 (Maipo)
Matrix products: default
BLAS/LAPACK: /nas/longleaf/apps/r/3.5.2/lib/libopenblas_haswellp-r0.3.5.so
locale:
 [1] LC_CTYPE=es_AR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=es_AR.UTF-8        LC_COLLATE=es_AR.UTF-8    
 [5] LC_MONETARY=es_AR.UTF-8    LC_MESSAGES=es_AR.UTF-8   
 [7] LC_PAPER=es_AR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_AR.UTF-8 LC_IDENTIFICATION=C       
attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     
other attached packages:
 [1] readr_1.3.1            tximport_1.10.1        GenomicFeatures_1.34.7
 [4] AnnotationDbi_1.44.0   Biobase_2.42.0         GenomicRanges_1.34.0  
 [7] GenomeInfoDb_1.18.2    IRanges_2.16.0         S4Vectors_0.20.1      
[10] BiocGenerics_0.28.0   
loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0                  pillar_1.3.1               
 [3] compiler_3.5.2              XVector_0.22.0             
 [5] prettyunits_1.0.2           bitops_1.0-6               
 [7] tools_3.5.2                 progress_1.2.0             
 [9] zlibbioc_1.28.0             biomaRt_2.38.0             
[11] digest_0.6.18               bit_1.1-14                 
[13] jsonlite_1.6                tibble_2.0.1               
[15] lattice_0.20-38             RSQLite_2.1.1              
[17] memoise_1.1.0               pkgconfig_2.0.2            
[19] rlang_0.3.1                 Matrix_1.2-15              
[21] DelayedArray_0.8.0          DBI_1.0.0                  
[23] GenomeInfoDbData_1.2.0      rtracklayer_1.42.1         
[25] stringr_1.4.0               httr_1.4.0                 
[27] Biostrings_2.50.2           hms_0.4.2                  
[29] grid_3.5.2                  bit64_0.9-7                
[31] R6_2.4.0                    XML_3.98-1.17              
[33] BiocParallel_1.16.5         blob_1.1.1                 
[35] magrittr_1.5                matrixStats_0.54.0         
[37] Rsamtools_1.34.1            GenomicAlignments_1.18.1   
[39] SummarizedExperiment_1.12.0 assertthat_0.2.0           
[41] stringi_1.3.1               RCurl_1.95-4.11            
[43] crayon_1.3.4

arabidopsis tximport gtf annotation • 1.0k views

ADD COMMENT • link updated 6.0 years ago by Michael Love 43k • written 6.0 years ago by francoagp • 0

score 2 · Accepted Answer · 2019-04-05

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 minutes ago

United States

Here the issue can be easily fixed. ignoreTxVersion is for trimming the version numbers from the quantification files. Here the issue is that you have version numbers in the tx2gene table. All you have to do is chop the version numbers in the table. You can do this with:

chop <- function(x) sub("\\..*","",x)