Hi, I am looking for a more recent EnsDb, as the standard one (v86) used in the Ensemb is quiet outdated and lacks a good deal of ENSP and/or their corresponding CDS. The most updated version is now v105:
https://www.ensembl.org/info/website/archives/assembly.html
I tried to follow the directions from the vignettes under "Getting EnsDb databases" but the most recent is v103.
To ensure access to the most up to date EnsDb databases:
Make sure your Bioconductor is the most up to date by running BiocManager::install() to check current version. * This may require you to update your R verison
This will allow you to install the most up to date Annotationhub which will contain the most recent ensdbs
I've been caught out by this before. It's confusing to still find those old packages, but at some point ensembldb moved away from the individual annotation packages and onto the Bioconductor AnnotationHub.
You can find the Ensembl 105 annotation on there e.g.
... and to get you started with that (retrieving database for species of interest) and subsequent usage you may want to have a look at the code posted in this thread: EnsDb.Rnorvegicus for Rnor6
When I try BiocManager::install("EnsDb.Hsapiens.v105"), I get "package 'EnsDb.Hsapiens.v105' is not available for Bioconductor version '3.14'" in R v4.1.
The force=TRUE option should be used sparingly and really only once. This options forces re-download of the data (costing us egress) instead of using the locally cached data. It's acceptable when necessary but wanted to point out this distinction.
Note that, alternatively, you can make a TxDb object:
library(GenomicFeatures)
txdb <- makeTxDbFromEnsembl("Homo sapiens", release=105)# Fetch transcripts and genes from Ensembl ... OK# (fetched 268255 transcripts from 69329 genes)# Fetch exons and CDS from Ensembl ... OK# Fetch chromosome names and lengths from Ensembl ...OK# Gather the metadata ... OK# Make the TxDb object ... OK
txdb
# TxDb object:# Db type: TxDb# Supporting package: GenomicFeatures# Data source: Ensembl# Organism: Homo sapiens# Ensembl release: 105# Ensembl database: homo_sapiens_core_105_38# MySQL server: ensembldb.ensembl.org# Full dataset: yes# Nb of transcripts: 268255# Db created by: GenomicFeatures package from Bioconductor# Creation time: 2022-01-25 15:53:41 -0800 (Tue, 25 Jan 2022)# GenomicFeatures version at creation time: 1.47.7# RSQLite version at creation time: 2.2.9# DBSCHEMAVERSION: 1.2
TxDb objects contain the same set of genomic features (genes/transcripts/exons/CDS) as EnsDb objects. However the former only import and store the greatest common denominator of what's provided by UCSC, Ensembl, and GTF/GFF3 files while the latter import additional Ensembl-specific attributes for each feature. For many use cases, the 2 types of objects are (almost) interchangeable so maybe that will do it for your use case:
I was using the proteinToGenome() command of the ensembldb package to be able to discern the genomic position of a premature terminal codon caused by a frameshift variant using HGVSp annotation for the variant. I was unable to find a similar command with GenomicFeatures but maybe I am wrong
Indeed, that's something that GenomicFeatures didn't have so far. Today I added the proteinToGenome() generic + a couple of methods to GenomicFeatures 1.47.10 (BioC devel). Loosely modeled on ensembldb::proteinToGenome(). See ?GenomicFeatures::proteinToGenome for the details.
... and to get you started with that (retrieving database for species of interest) and subsequent usage you may want to have a look at the code posted in this thread: EnsDb.Rnorvegicus for Rnor6
That is what I was using.
AnnotationHub()
is explained in theEnsemblDB
vignettes under "Getting EnsDb databases ". Here is my code:The most recent one is v103.
I am trying to get AnnotationHub or EnsemblDB to create one for v105.
What version of R and
AnnotationHub
are you using? Release 105 is definitely available for the current Bioconductor release (3.14) with R 4.1.Yup, my R is outdated by a single release and I cant get Bioconductor 3.14 because of that
>BiocManager::install()
outputs:When I try
BiocManager::install("EnsDb.Hsapiens.v105")
, I get "package 'EnsDb.Hsapiens.v105' is not available for Bioconductor version '3.14'" in R v4.1.The data is available via the AnnotationHub.
This solution has been working for me. However, today I was surprised by an error message when trying to print the
data
object:"Error: no such table: gene"
I also get lots of error messages when trying to use
data
for downstream applications, such as inlocuszoomr::locus
.Update:
data <- ah[["AH98047", force = TRUE]]
fixed the problem.
The
force=TRUE
option should be used sparingly and really only once. This options forces re-download of the data (costing us egress) instead of using the locally cached data. It's acceptable when necessary but wanted to point out this distinction.