Question

ensembldb error/bug: Can't locate Bio/EnsEMBL/ApiVersion.pm in @INC | fetchTablesFromEnsembl

2

Entering edit mode

Ramiro Magno ▴ 100

@ramiro-magno-12376

Last seen 6.6 years ago

CBMR, Faro, Portugal

Hi

I am trying to generate an Ensembl version 92 annotation package using the function fetchTablesFromEnsembl from the ensembldb package:

fetchTablesFromEnsembl(92, species = "human")

but I get this error related to missing Perl modules:

Empty compile time value given to use lib at /home/rmagno/R/x86_64-pc-linux-gnu-library/3.4/ensembldb/perl/get_gene_transcript_exon_tables.pl line 22.
Use of uninitialized value in require at /home/rmagno/R/x86_64-pc-linux-gnu-library/3.4/ensembldb/perl/get_gene_transcript_exon_tables.pl line 27.
Can't locate Bio/EnsEMBL/ApiVersion.pm in @INC (you may need to install the Bio::EnsEMBL::ApiVersion module) (@INC contains:  /usr/lib/perl5/5.26/site_perl /usr/share/perl5/site_perl /usr/lib/perl5/5.26/vendor_perl /usr/share/perl5/vendor_perl /usr/lib/perl5/5.26/core_perl /usr/share/perl5/core_perl) at /home/rmagno/R/x86_64-pc-linux-gnu-library/3.4/ensembldb/perl/get_gene_transcript_exon_tables.pl line 27.
BEGIN failed--compilation aborted at /home/rmagno/R/x86_64-pc-linux-gnu-library/3.4/ensembldb/perl/get_gene_transcript_exon_tables.pl line 27.
Error in fetchTablesFromEnsembl(92, species = "human") : 
  Something went wrong! I'm missing some of the txt files the perl script should have generated.

Thanks in advance.

Session info below.

R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS: /usr/lib/libblas.so.3.8.0
LAPACK: /usr/lib/liblapack.so.3.8.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ensembldb_2.2.2        AnnotationFilter_1.2.0 GenomicFeatures_1.30.3 AnnotationDbi_1.40.0   Biobase_2.38.0         GenomicRanges_1.30.3  
 [7] GenomeInfoDb_1.14.0    IRanges_2.12.0         S4Vectors_0.16.0       AnnotationHub_2.10.1   BiocGenerics_0.24.0   

loaded via a namespace (and not attached):
 [1] SummarizedExperiment_1.8.1    progress_1.1.2                lattice_0.20-35               htmltools_0.3.6              
 [5] rtracklayer_1.38.3            yaml_2.1.18                   interactiveDisplayBase_1.16.0 blob_1.1.1                   
 [9] XML_3.98-1.10                 DBI_0.8                       BiocParallel_1.12.0           bit64_0.9-7                  
[13] matrixStats_0.53.1            GenomeInfoDbData_1.0.0        ProtGenerics_1.10.0           stringr_1.3.0                
[17] zlibbioc_1.24.0               Biostrings_2.46.0             memoise_1.1.0                 biomaRt_2.34.2               
[21] httpuv_1.3.6.2                BiocInstaller_1.28.0          curl_3.2                      Rcpp_0.12.16                 
[25] xtable_1.8-2                  DelayedArray_0.4.1            XVector_0.18.0                mime_0.5                     
[29] bit_1.1-12                    Rsamtools_1.30.0              RMySQL_0.10.14                digest_0.6.15                
[33] stringi_1.1.7                 shiny_1.0.5                   grid_3.4.3                    tools_3.4.3                  
[37] bitops_1.0-6                  magrittr_1.5                  RCurl_1.95-4.10               lazyeval_0.2.1               
[41] RSQLite_2.1.0                 pkgconfig_2.0.1               Matrix_1.2-12                 prettyunits_1.0.2            
[45] assertthat_0.2.0              httr_1.3.1                    R6_2.2.2                      GenomicAlignments_1.14.2     
[49] compiler_3.4.3

ensembldb • 4.1k views

ADD COMMENT • link updated 7.8 years ago by Johannes Rainer ★ 2.1k • written 7.8 years ago by Ramiro Magno ▴ 100

score 2 · Accepted Answer · 2018-04-12

2

Entering edit mode

Johannes Rainer ★ 2.1k

@johannes-rainer-6987

Last seen 15 months ago

Italy

Hi Ramiro,

+1 for trying to build an EnsDb on your own. This requires however (as detailed in the vignette) a local installation of the Ensembl Perl API (http://www.ensembl.org/info/docs/api/api_installation.html) along with perl version 5.18.0. The function uses the Ensembl Perl API to query the Ensembl databases. Note also that building an EnsDb that way takes quite long.

If you don't want to go through the struggle to install the Ensembl Perl API etc: I could provide you with an EnsDb package. Alternatively, I've just finished uploading EnsDbs for all species in Ensembl v92 to AnnotationHub - but it will take some time for them to become available and you would also need the current devel version of Bioconductor (3.7).

let me know if you'd like me to provide you with the EnsDB/package

cheers, jo

ADD COMMENT • link 7.8 years ago Johannes Rainer ★ 2.1k

0

Entering edit mode

Johannes: Thank you!

Just to have an idea, how long is quite long?

ADD REPLY • link 7.8 years ago Ramiro Magno ▴ 100

0

Entering edit mode

I'm installing the Ensembl core databases locally and it takes ~ 4-5 hours (depdends on the species, human takes quite a while). If you're querying the databases at Ensembl it might take even longer.

ADD REPLY • link 7.8 years ago Johannes Rainer ★ 2.1k

1

Entering edit mode

Just in case you want the quick and easy way out: you can download the human EnsDb package for Ensembl v92 from:

https://www.dropbox.com/s/plne78gvnznwbl7/EnsDb.Hsapiens.v92_2.0.0.tar.gz?dl=0

ADD REPLY • link 7.8 years ago Johannes Rainer ★ 2.1k

0

Entering edit mode

May the Force be with you.

ADD REPLY • link 7.8 years ago Ramiro Magno ▴ 100

0

Entering edit mode

Hi Johannes,
Thank you so much for providing the link to the latest EnsDb package.

I am trying to analyze some salmon quantification files, and since I used the transcript annotation files from here (release 92) ftp://ftp.ensembl.org/pub/release-92/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz, I was in need of a solution re the EnsDb package. However, after using tximport to import my quant.sf files, I am receiving the following message:

Error in summarizeToGene(txi, tx2gene, ignoreTxVersion, countsFromAbundance) :
None of the transcripts in the quantification files are present
in the first column of tx2gene. Check to see that you are using
the same annotation for both.

I wanted to check that the package that you suggested in the dropbox link below is indeed a match to the transcriptome link I posted?

Thank You!

Best, Rina

ADD REPLY • link 7.7 years ago rbenel ▴ 50

0

Entering edit mode

Both the EnsDb from the link above and the cdna fasta file are based on Ensembl release 92, so all transcripts from the cdna fasta file should be in the EnsDb. Note however that transcript IDs in EnsDb databases are without the transcript version (e.g. the ".1" in "ENST00001.1"). Did you use ignoreTxVersion = TRUE?

ADD REPLY • link 7.7 years ago Johannes Rainer ★ 2.1k

0

Entering edit mode

Thanks! I didn't realize that the EnsDb does not include transcript versions, the ignoreTxVersion = TRUE did the trick. I will continue with the analysis as suggested in the vignette.

ADD REPLY • link 7.7 years ago rbenel ▴ 50

0

Entering edit mode

Hi Johannes, I indeed used the package you provided for my annotations when importing salmon results using txiimport; however, I am receiving very few ncRNA. As I used the cdna.all ensembl file, I can't think why the ncRNA would not be included in my index, as well as in the package you sent... any thoughts?

ADD REPLY • link 7.6 years ago rbenel ▴ 50

0

Entering edit mode

I would assume that most (if not all) ncRNAs are in the ncrna fasta file (e.g. homo_sapiens/ncrna/Homo_sapiens.GRCh38.ncrna.fa.gz). I've checked and all of the IDs in this file are present in the EnsDb (for Ensembl version 92).

ADD REPLY • link 7.6 years ago Johannes Rainer ★ 2.1k

0

Entering edit mode

OK... As ncRNA are transcribed, I would assume they were in the cdna.all file... and that the folder with the ncRNA, was if someone wanted to just look at ncRNA... The problem is I am interested in both coding and non-coding.... Is there a way to verify that the cdna.all doesn't include ncRNA before re-indexing and re-running Salmon?

ADD REPLY • link 7.6 years ago rbenel ▴ 50

0

Entering edit mode

I just checked the tx IDs in the ncrna and the cdna files and they are not overlapping. So both files contain a different set of genes/transcripts.

ADD REPLY • link 7.6 years ago Johannes Rainer ★ 2.1k

0

Entering edit mode

OK. So I could combine them using cat in a linux system?

ADD REPLY • link 7.6 years ago rbenel ▴ 50

1

Entering edit mode

I guess so. I've never done that but it should work.

ADD REPLY • link 7.6 years ago Johannes Rainer ★ 2.1k