tximeta error in fetching genome info
1
0
Entering edit mode
MT • 0
@mt-12730
Last seen 16 months ago

Hello, I am trying to quantify human transcripts mapped with salmon in some RNA sequencing (GSE107467) using tximeta.

This is the code I use to import the annotation file and the transcript file.

dir2 <- system.file("extdata", package="tximeta")
 indexDir <- file.path(dir2, "gencode_v29_idx")
 fastaFTP <- "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.transcripts.fa.gz"
 gtfPath <- "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.primary_assembly.annotation.gtf.gz"
suppressPackageStartupMessages(library(tximeta))
makeLinkedTxome(indexDir=indexDir,
                source="GENCODE",
                organism="Homo sapiens",
                release="29",
                genome="GRCh38",
                fasta=fastaFTP,
                gtf=gtfPath,
                write=FALSE)

and this is the code to import the salmon quantification files in tximeta

dir = '../nowakowski'
setwd(dir)
list.files()
files <-list.files(path='data')
files2 <- file.path(dir, "data", files, "quant.sf")
cond = c(rep('E19',5),rep('E15',4))

list = as.data.frame(cbind(files,files2,cond), stringsAsfactors=FALSE)
colnames(list)=c('names','files','condition')
list[,1] <- gsub('_1.sf', '', list[,1])
file.exists(as.vector(list[,2]))
se <- tximeta(list)

which outputs:

importing quantifications
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 
found matching linked transcriptome:
[ Gencode - Homo sapiens - release 29 ]
loading existing TxDb created: 2019-12-09 21:58:05
Loading required package: GenomicFeatures
generating transcript ranges
fetching genome info
Error in FUN(genome = names(SUPPORTED_UCSC_GENOMES)[idx], circ_seqs = supported_genome$circ_seqs,  : 
  cannot map the following UCSC seqlevel(s) to an NCBI seqlevel: chr8_KZ208915v1_fix, chr15_KN538374v1_fix,
  chr15_KQ031389v1_alt, chr16_KV880768v1_fix, chr12_KZ208916v1_fix, chr14_KZ208920v1_fix, chr7_KZ208913v1_alt,
  chr5_KV575244v1_fix, chr7_KZ208912v1_fix, chr1_KV880763v1_alt, chr12_KN538369v1_fix, chr2_KQ983256v1_alt,
  chr2_KQ031384v1_fix, chr16_KZ559113v1_fix, chr7_KV880765v1_fix, chr1_KQ031383v1_fix, chr1_KN538360v1_fix,
  chr3_KN196475v1_fix, chr4_KV766193v1_alt, chr10_KN538367v1_fix, chr3_KN538364v1_fix, chr3_KV766192v1_fix,
  chr18_KQ090028v1_fix, chr19_KQ458386v1_fix, chr3_KQ031385v1_fix, chr19_KN196484v1_fix, chr2_KN538363v1_fix,
  chr5_KV575243v1_alt, chr13_KN538372v1_fix, chr1_KQ458383v1_alt, chr9_KN196479v1_fix, chr1_KZ208906v1_fix,
  chr6_KQ031387v1_fix, chr12_KQ759760v1_fix, chr3_KN196476v1_fix, chr1_KN538361v1_fix, chr11_KZ559108v1_fix,
  chr3_KZ559103v1_alt, chr11_KZ559110v1_alt, chr19_KV575249v1_alt, chr17_KV766196v1_fix, chr11_KZ559109v1_fix,
  chr1_KQ98

I am using a salmon index obtained from the same transcript file to map the reads... I do not understand. The same code works for mouse RNAseq. This works:

dir2 <- system.file("extdata", package="tximeta")
indexDir <- file.path(dir2, "gencode_vM23_idx")
fastaFTP <- "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M23/gencode.vM23.transcripts.fa.gz"
gtfPath <- "ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M23/gencode.vM23.primary_assembly.annotation.gtf.gz"
suppressPackageStartupMessages(library(tximeta))
makeLinkedTxome(indexDir=indexDir,
                source="GENCODE",
                organism="Mus musculus",
                release="M23",
                genome="GRCm38",
                fasta=fastaFTP,
                gtf=gtfPath,
                write=FALSE)
se <- tximeta(list)

There is a problem with the patched human genome fetched by tximeta: https://support.bioconductor.org/p/117808/

Sequences like chr15KN538374v1fix don't belong to GRCh38 but to recent patched versions of GRCh38 (e.g. GRCh38.p12, which is the latest). Unfortunately it looks like the UCSC folks updated the chromInfo table for hg38 a couple of days ago, and that the table now contains chrom info for GRCh38.p12 (or maybe GRCh38.p11). You can see a dump of this SQL table here:

But i do not know how to solve it. Can anyone help me please?

Thanks, Marco

ps: I add sessionInfo

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /pstore/apps/OpenBLAS/0.2.13-GCC-4.8.4-LAPACK-3.5.0/lib/libopenblas_prescottp-r0.2.13.so

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                  LC_TIME=en_US.UTF-8          
 [4] LC_COLLATE=en_US.UTF-8        LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
 [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8           LC_ADDRESS=en_US.UTF-8       
[10] LC_TELEPHONE=en_US.UTF-8      LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GenomicFeatures_1.34.8      apeglm_1.4.2                PoiClaClu_1.0.2.1           pheatmap_1.0.12            
 [5] ggplot2_3.2.1               dplyr_0.8.3                 DESeq2_1.20.0               SummarizedExperiment_1.10.1
 [9] DelayedArray_0.6.4          BiocParallel_1.14.2         matrixStats_0.54.0          GenomicRanges_1.32.6       
[13] GenomeInfoDb_1.16.0         tximeta_1.0.3               STRINGdb_1.22.0             igraph_1.2.4.2             
[17] xlsx_0.6.1                  RColorBrewer_1.1-2          org.Hs.eg.db_3.7.0          org.Mm.eg.db_3.7.0         
[21] AnnotationDbi_1.44.0        IRanges_2.14.10             S4Vectors_0.18.3            Biobase_2.40.0             
[25] BiocGenerics_0.26.0         gplots_3.0.1                Glimma_1.10.1               edgeR_3.22.3               
[29] limma_3.36.2               

loaded via a namespace (and not attached):
  [1] colorspace_1.3-2         htmlTable_1.12           XVector_0.20.0           base64enc_0.1-3         
  [5] rstudioapi_0.10          hash_2.2.6.1             bit64_0.9-7              sqldf_0.4-11            
  [9] splines_3.5.1            tximport_1.10.1          geneplotter_1.58.0       knitr_1.20              
 [13] zeallot_0.1.0            Formula_1.2-3            jsonlite_1.6             Rsamtools_1.32.2        
 [17] annotate_1.58.0          rJava_0.9-10             cluster_2.0.7-1          dbplyr_1.4.2            
 [21] png_0.1-7                readr_1.3.1              compiler_3.5.1           httr_1.4.1              
 [25] backports_1.1.2          assertthat_0.2.0         Matrix_1.3-0             lazyeval_0.2.1          
 [29] acepack_1.4.1            htmltools_0.3.6          prettyunits_1.0.2        tools_3.5.1             
 [33] coda_0.19-1              gtable_0.2.0             glue_1.3.0               GenomeInfoDbData_1.1.0  
 [37] rappdirs_0.3.1           Rcpp_1.0.3               bbmle_1.0.20             vctrs_0.2.0             
 [41] Biostrings_2.48.0        gdata_2.18.0             rtracklayer_1.40.3       stringr_1.4.0           
 [45] xlsxjars_0.6.1           proto_1.0.0              ensembldb_2.6.8          gtools_3.8.1            
 [49] XML_3.98-1.13            MASS_7.3-50              zlibbioc_1.26.0          scales_0.5.0            
 [53] BiocInstaller_1.30.0     hms_0.5.2                ProtGenerics_1.14.0      AnnotationFilter_1.6.0  
 [57] yaml_2.2.0               curl_3.2                 memoise_1.1.0            gridExtra_2.3           
 [61] emdbook_1.3.11           biomaRt_2.36.1           rpart_4.1-13             latticeExtra_0.6-28     
 [65] stringi_1.2.4            RSQLite_2.1.1            genefilter_1.62.0        plotrix_3.7-7           
 [69] checkmate_1.8.5          caTools_1.17.1.1         chron_2.3-52             rlang_0.4.2             
 [73] pkgconfig_2.0.1          bitops_1.0-6             lattice_0.20-35          purrr_0.3.3             
 [77] GenomicAlignments_1.16.0 htmlwidgets_1.5.1        bit_1.1-14               tidyselect_0.2.5        
 [81] plyr_1.8.4               magrittr_1.5             R6_2.2.2                 Hmisc_4.1-1             
 [85] DBI_1.0.0                withr_2.1.2              gsubfn_0.7               pillar_1.4.2            
 [89] foreign_0.8-71           survival_2.42-6          RCurl_1.95-4.11          nnet_7.3-12             
 [93] tibble_2.1.3             crayon_1.3.4             KernSmooth_2.23-15       BiocFileCache_1.6.0     
 [97] progress_1.2.0           locfit_1.5-9.1           grid_3.5.1               data.table_1.11.4       
[101] blob_1.1.1               digest_0.6.15            xtable_1.8-2             numDeriv_2016.8-1       
[105] munsell_0.5.0
rnaseq tximeta • 270 views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 19 hours ago
United States

If you note the above linked issue, Hervé Pagès pushed a fix. That was back in GenomeInfoDb_1.18.1. Above you are working with GenomeInfoDb_1.16.0. So you definitely don't have the fix which was applied to a release after the version of code that you have.

I'd recommend updating to the latest version of Bioconductor:

https://bioconductor.org/install/

ADD COMMENT
0
Entering edit mode

Thanks Michael, you are right. I tried to update all packages but GenomeInfoDb gave an error and I did not noticed. I had to update it manually downloading the package. Thanks again now it works like a charm.

ADD REPLY

Login before adding your answer.

Traffic: 457 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6