Hi
I am trying to generate an Ensembl version 92 annotation package using the function fetchTablesFromEnsembl from the ensembldb package:
fetchTablesFromEnsembl(92, species = "human")
but I get this error related to missing Perl modules:
Empty compile time value given to use lib at /home/rmagno/R/x86_64-pc-linux-gnu-library/3.4/ensembldb/perl/get_gene_transcript_exon_tables.pl line 22. Use of uninitialized value in require at /home/rmagno/R/x86_64-pc-linux-gnu-library/3.4/ensembldb/perl/get_gene_transcript_exon_tables.pl line 27. Can't locate Bio/EnsEMBL/ApiVersion.pm in @INC (you may need to install the Bio::EnsEMBL::ApiVersion module) (@INC contains: /usr/lib/perl5/5.26/site_perl /usr/share/perl5/site_perl /usr/lib/perl5/5.26/vendor_perl /usr/share/perl5/vendor_perl /usr/lib/perl5/5.26/core_perl /usr/share/perl5/core_perl) at /home/rmagno/R/x86_64-pc-linux-gnu-library/3.4/ensembldb/perl/get_gene_transcript_exon_tables.pl line 27. BEGIN failed--compilation aborted at /home/rmagno/R/x86_64-pc-linux-gnu-library/3.4/ensembldb/perl/get_gene_transcript_exon_tables.pl line 27. Error in fetchTablesFromEnsembl(92, species = "human") : Something went wrong! I'm missing some of the txt files the perl script should have generated.
Thanks in advance.
Session info below.
R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Arch Linux Matrix products: default BLAS: /usr/lib/libblas.so.3.8.0 LAPACK: /usr/lib/liblapack.so.3.8.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] ensembldb_2.2.2 AnnotationFilter_1.2.0 GenomicFeatures_1.30.3 AnnotationDbi_1.40.0 Biobase_2.38.0 GenomicRanges_1.30.3 [7] GenomeInfoDb_1.14.0 IRanges_2.12.0 S4Vectors_0.16.0 AnnotationHub_2.10.1 BiocGenerics_0.24.0 loaded via a namespace (and not attached): [1] SummarizedExperiment_1.8.1 progress_1.1.2 lattice_0.20-35 htmltools_0.3.6 [5] rtracklayer_1.38.3 yaml_2.1.18 interactiveDisplayBase_1.16.0 blob_1.1.1 [9] XML_3.98-1.10 DBI_0.8 BiocParallel_1.12.0 bit64_0.9-7 [13] matrixStats_0.53.1 GenomeInfoDbData_1.0.0 ProtGenerics_1.10.0 stringr_1.3.0 [17] zlibbioc_1.24.0 Biostrings_2.46.0 memoise_1.1.0 biomaRt_2.34.2 [21] httpuv_1.3.6.2 BiocInstaller_1.28.0 curl_3.2 Rcpp_0.12.16 [25] xtable_1.8-2 DelayedArray_0.4.1 XVector_0.18.0 mime_0.5 [29] bit_1.1-12 Rsamtools_1.30.0 RMySQL_0.10.14 digest_0.6.15 [33] stringi_1.1.7 shiny_1.0.5 grid_3.4.3 tools_3.4.3 [37] bitops_1.0-6 magrittr_1.5 RCurl_1.95-4.10 lazyeval_0.2.1 [41] RSQLite_2.1.0 pkgconfig_2.0.1 Matrix_1.2-12 prettyunits_1.0.2 [45] assertthat_0.2.0 httr_1.3.1 R6_2.2.2 GenomicAlignments_1.14.2 [49] compiler_3.4.3
Johannes: Thank you!
Just to have an idea, how long is quite long?
I'm installing the Ensembl core databases locally and it takes ~ 4-5 hours (depdends on the species, human takes quite a while). If you're querying the databases at Ensembl it might take even longer.
Just in case you want the quick and easy way out: you can download the human EnsDb package for Ensembl v92 from:
https://www.dropbox.com/s/plne78gvnznwbl7/EnsDb.Hsapiens.v92_2.0.0.tar.gz?dl=0
May the Force be with you.
Hi Johannes,
Thank you so much for providing the link to the latest EnsDb package.
I am trying to analyze some salmon quantification files, and since I used the transcript annotation files from here (release 92) ftp://ftp.ensembl.org/pub/release-92/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz, I was in need of a solution re the EnsDb package. However, after using tximport to import my quant.sf files, I am receiving the following message:
Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) :
None of the transcripts in the quantification files are present
in the first column of tx2gene. Check to see that you are using
the same annotation for both.
I wanted to check that the package that you suggested in the dropbox link below is indeed a match to the transcriptome link I posted?
Thank You!
Best, Rina
Both the
EnsDb
from the link above and the cdna fasta file are based on Ensembl release 92, so all transcripts from the cdna fasta file should be in theEnsDb
. Note however that transcript IDs inEnsDb
databases are without the transcript version (e.g. the ".1" in "ENST00001.1"). Did you useignoreTxVersion = TRUE
?Thanks! I didn't realize that the EnsDb does not include transcript versions, the
ignoreTxVersion = TRUE
did the trick. I will continue with the analysis as suggested in the vignette.Hi Johannes, I indeed used the package you provided for my annotations when importing salmon results using
txiimport
; however, I am receiving very few ncRNA. As I used the cdna.all ensembl file, I can't think why the ncRNA would not be included in my index, as well as in the package you sent... any thoughts?I would assume that most (if not all) ncRNAs are in the ncrna fasta file (e.g. homo_sapiens/ncrna/Homo_sapiens.GRCh38.ncrna.fa.gz). I've checked and all of the IDs in this file are present in the
EnsDb
(for Ensembl version 92).OK... As ncRNA are transcribed, I would assume they were in the cdna.all file... and that the folder with the ncRNA, was if someone wanted to just look at ncRNA... The problem is I am interested in both coding and non-coding.... Is there a way to verify that the cdna.all doesn't include ncRNA before re-indexing and re-running Salmon?
I just checked the tx IDs in the ncrna and the cdna files and they are not overlapping. So both files contain a different set of genes/transcripts.
OK. So I could combine them using
cat
in a linux system?I guess so. I've never done that but it should work.