Im trying to run tximport for the DESeq2 pipeline for an Arabidopsis DE analysis and i found this error, first i try with ignoretxversion=TRUE,ignoreafterbar=TRUE and none of those flags works, then i read the tx2gene object to compare it with quants.sf file and i see this:
> head(tx2gene,10)
TXNAME GENEID
1 AT1G01010.1 AT1G01010
2 AT1G03987.1 AT1G03987
3 AT1G01040.1 AT1G01040
4 AT1G01040.2 AT1G01040
5 at1g01046 AT1G01046
6 AT1G03997.1 AT1G03997
7 AT1G01110.2 AT1G01110
8 AT1G01110.1 AT1G01110
9 AT1G01160.1 AT1G01160
10 AT1G01160.2 AT1G01160
I have done just a few DE analysis, this is my first study in plants, and i havent seen this never (lowercase for some txnames), can be this making the error happen? why the gtf and gff files have this format for some txs? how i solve that, just converting the txsnames to uppercase?
I let the output and session info down Thanks for anyresponse
> library("tximport")
> library("readr")
> dir <- "/path/to/quants"
> samples <- read.csv(file.path(dir,"sampleTable.csv"), header=TRUE)
> rownames(samples) <- samples$dirName
>
> files <- file.path(dir, samples$dirName, "quant.sf")
> names(files) <- samples$dirName
> library("GenomicFeatures")
> gtffile <- file.path(dir,"Arabidopsis_thaliana.TAIR10.42.gtf")
> txdb <- makeTxDbFromGFF(gtffile, format = "gtf", circ_seqs = character())
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .get_cds_IDX(type, phase) :
The "phase" metadata column contains non-NA values for features of type
stop_codon. This information was ignored.
> k <- keys(txdb, keytype = "TXNAME")
> tx2gene <- select(txdb, k, "GENEID", "TXNAME")
'select()' returned 1:1 mapping between keys and columns
> txi <- tximport(files, type="salmon", tx2gene=tx2gene, ignoreTxVersion = TRUE, ignoreAfterBar = TRUE)
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 11
Error in summarizeToGene(txi, tx2gene, varReduce, ignoreTxVersion, ignoreAfterBar, :
None of the transcripts in the quantification files are present
in the first column of tx2gene. Check to see that you are using
the same annotation for both.
Example IDs (file): [AT1G19090, AT1G18320, AT5G11100, ...]
Example IDs (tx2gene): [AT1G01010.1, AT1G03987.1, AT1G01040.1, ...]
This can sometimes (not always) be fixed using 'ignoreTxVersion' or 'ignoreAfterBar'
SessionInfo:
> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.6 (Maipo)
Matrix products: default
BLAS/LAPACK: /nas/longleaf/apps/r/3.5.2/lib/libopenblas_haswellp-r0.3.5.so
locale:
[1] LC_CTYPE=es_AR.UTF-8 LC_NUMERIC=C
[3] LC_TIME=es_AR.UTF-8 LC_COLLATE=es_AR.UTF-8
[5] LC_MONETARY=es_AR.UTF-8 LC_MESSAGES=es_AR.UTF-8
[7] LC_PAPER=es_AR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_AR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] readr_1.3.1 tximport_1.10.1 GenomicFeatures_1.34.7
[4] AnnotationDbi_1.44.0 Biobase_2.42.0 GenomicRanges_1.34.0
[7] GenomeInfoDb_1.18.2 IRanges_2.16.0 S4Vectors_0.20.1
[10] BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 pillar_1.3.1
[3] compiler_3.5.2 XVector_0.22.0
[5] prettyunits_1.0.2 bitops_1.0-6
[7] tools_3.5.2 progress_1.2.0
[9] zlibbioc_1.28.0 biomaRt_2.38.0
[11] digest_0.6.18 bit_1.1-14
[13] jsonlite_1.6 tibble_2.0.1
[15] lattice_0.20-38 RSQLite_2.1.1
[17] memoise_1.1.0 pkgconfig_2.0.2
[19] rlang_0.3.1 Matrix_1.2-15
[21] DelayedArray_0.8.0 DBI_1.0.0
[23] GenomeInfoDbData_1.2.0 rtracklayer_1.42.1
[25] stringr_1.4.0 httr_1.4.0
[27] Biostrings_2.50.2 hms_0.4.2
[29] grid_3.5.2 bit64_0.9-7
[31] R6_2.4.0 XML_3.98-1.17
[33] BiocParallel_1.16.5 blob_1.1.1
[35] magrittr_1.5 matrixStats_0.54.0
[37] Rsamtools_1.34.1 GenomicAlignments_1.18.1
[39] SummarizedExperiment_1.12.0 assertthat_0.2.0
[41] stringi_1.3.1 RCurl_1.95-4.11
[43] crayon_1.3.4
Thanks Michael, that works for me!