Where can I find EnsDb.Hsapiens.v105?
3
2
Entering edit mode
@8f2f6acd
Last seen 4 months ago
United States

Hi, I am looking for a more recent EnsDb, as the standard one (v86) used in the Ensemb is quiet outdated and lacks a good deal of ENSP and/or their corresponding CDS. The most updated version is now v105: https://www.ensembl.org/info/website/archives/assembly.html

I tried to follow the directions from the vignettes under "Getting EnsDb databases" but the most recent is v103.

I was going to attempt to make the EnsDb, but apparently it is not that simple: ensembldb error/bug: Can't locate Bio/EnsEMBL/ApiVersion.pm in @INC | fetchTablesFromEnsembl

Thanks much!

Ensembl Rstudio ensembldb AnnotationHub ensdb • 5.4k views
ADD COMMENT
2
Entering edit mode
@8f2f6acd
Last seen 4 months ago
United States

To ensure access to the most up to date EnsDb databases:

  1. Make sure your Bioconductor is the most up to date by running BiocManager::install() to check current version. * This may require you to update your R verison

  2. This will allow you to install the most up to date Annotationhub which will contain the most recent ensdbs

ADD COMMENT
1
Entering edit mode
Mike Smith ★ 6.4k
@mike-smith
Last seen 4 hours ago
EMBL Heidelberg

I've been caught out by this before. It's confusing to still find those old packages, but at some point ensembldb moved away from the individual annotation packages and onto the Bioconductor AnnotationHub.

You can find the Ensembl 105 annotation on there e.g.

library(AnnotationHub)
ah <- AnnotationHub()
#> snapshotDate(): 2021-10-20

query(ah, c("EnsDb", "v105"))
#> AnnotationHub with 243 records
#> # snapshotDate(): 2021-10-20
#> # $dataprovider: Ensembl
#> # $species: Zonotrichia albicollis, Zalophus californianus, Xiphophorus macu...
#> # $rdataclass: EnsDb
#> # additional mcols(): taxonomyid, genome, description,
#> #   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#> #   rdatapath, sourceurl, sourcetype 
#> # retrieve records with, e.g., 'object[["AH97950"]]' 
#> 
#>             title                                         
#>   AH97950 | Ensembl 105 EnsDb for Anser brachyrhynchus    
#>   AH97951 | Ensembl 105 EnsDb for Astatotilapia calliptera
#>   AH97952 | Ensembl 105 EnsDb for Anolis carolinensis     
#>   AH97953 | Ensembl 105 EnsDb for Amphilophus citrinellus 
#>   AH97954 | Ensembl 105 EnsDb for Amazona collaria        
#>   ...       ...                                           
#>   AH98188 | Ensembl 105 EnsDb for Xiphophorus couchianus  
#>   AH98189 | Ensembl 105 EnsDb for Xiphophorus maculatus   
#>   AH98190 | Ensembl 105 EnsDb for Xenopus tropicalis      
#>   AH98191 | Ensembl 105 EnsDb for Zonotrichia albicollis  
#>   AH98192 | Ensembl 105 EnsDb for Zalophus californianus

Hopefully the AnnotationHub documentation should help you work out how to retrieve the species you want.

ADD COMMENT
1
Entering edit mode

... and to get you started with that (retrieving database for species of interest) and subsequent usage you may want to have a look at the code posted in this thread: EnsDb.Rnorvegicus for Rnor6

ADD REPLY
0
Entering edit mode

That is what I was using. AnnotationHub() is explained in the EnsemblDB vignettes under "Getting EnsDb databases ". Here is my code:

library(AnnotationHub)
## Load the annotation resource.
ah <- AnnotationHub()
## Query for available H.Sapiens EnsDb databases
ahDb <- query(ah, pattern = c("Homo Sapiens", "EnsDb"))

The most recent one is v103.

I am trying to get AnnotationHub or EnsemblDB to create one for v105.

ADD REPLY
1
Entering edit mode

What version of R and AnnotationHub are you using? Release 105 is definitely available for the current Bioconductor release (3.14) with R 4.1.

ADD REPLY
0
Entering edit mode

Yup, my R is outdated by a single release and I cant get Bioconductor 3.14 because of that >BiocManager::install() outputs:

Bioconductor version 3.12 (BiocManager 1.30.16), R 4.0.5 (2021-03-31)
ADD REPLY
0
Entering edit mode

When I try BiocManager::install("EnsDb.Hsapiens.v105"), I get "package 'EnsDb.Hsapiens.v105' is not available for Bioconductor version '3.14'" in R v4.1.

ADD REPLY
1
Entering edit mode

The data is available via the AnnotationHub.

> BiocManager::install("AnnotationHub")
> library(AnnotationHub)
> ah = AnnotationHub()
snapshotDate(): 2024-02-09
> query(ah, "EnsDb.Hsapiens.v105")
AnnotationHub with 1 record
# snapshotDate(): 2024-02-09
# names(): AH98047
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: EnsDb
# $rdatadateadded: 2021-10-20
# $title: Ensembl 105 EnsDb for Homo sapiens
# $description: Gene and protein annotations for Homo sapiens based on Ensem...
# $taxonomyid: 9606
# $genome: GRCh38
# $sourcetype: ensembl
# $sourceurl: http://www.ensembl.org
# $sourcesize: NA
# $tags: c("105", "Annotation", "AnnotationHubSoftware", "Coverage",
#   "DataImport", "EnsDb", "Ensembl", "Gene", "Protein", "Sequencing",
#   "Transcript") 
# retrieve record with 'object[["AH98047"]]' 
data <- ah[["AH98047"]]
ADD REPLY
0
Entering edit mode
@herve-pages-1542
Last seen 1 hour ago
Seattle, WA, United States

Note that, alternatively, you can make a TxDb object:

library(GenomicFeatures)

txdb <- makeTxDbFromEnsembl("Homo sapiens", release=105)
# Fetch transcripts and genes from Ensembl ... OK
#   (fetched 268255 transcripts from 69329 genes)
# Fetch exons and CDS from Ensembl ... OK
# Fetch chromosome names and lengths from Ensembl ...OK
# Gather the metadata ... OK
# Make the TxDb object ... OK

txdb
# TxDb object:
# Db type: TxDb
# Supporting package: GenomicFeatures
# Data source: Ensembl
# Organism: Homo sapiens
# Ensembl release: 105
# Ensembl database: homo_sapiens_core_105_38
# MySQL server: ensembldb.ensembl.org
# Full dataset: yes
# Nb of transcripts: 268255
# Db created by: GenomicFeatures package from Bioconductor
# Creation time: 2022-01-25 15:53:41 -0800 (Tue, 25 Jan 2022)
# GenomicFeatures version at creation time: 1.47.7
# RSQLite version at creation time: 2.2.9
# DBSCHEMAVERSION: 1.2

TxDb objects contain the same set of genomic features (genes/transcripts/exons/CDS) as EnsDb objects. However the former only import and store the greatest common denominator of what's provided by UCSC, Ensembl, and GTF/GFF3 files while the latter import additional Ensembl-specific attributes for each feature. For many use cases, the 2 types of objects are (almost) interchangeable so maybe that will do it for your use case:

ensdb <- ah[["AH98047"]]
ensdb
# EnsDb for Ensembl:
# |Backend: SQLite
# |Db type: EnsDb
# |Type of Gene ID: Ensembl Gene ID
# |Supporting package: ensembldb
# |Db created by: ensembldb package from Bioconductor
# |script_version: 0.3.7
# |Creation time: Sat Dec 18 14:48:15 2021
# |ensembl_version: 105
# |ensembl_host: localhost
# |Organism: Homo sapiens
# |taxonomy_id: 9606
# |genome_build: GRCh38
# |DBSCHEMAVERSION: 2.2
# | No. of genes: 69329.
# | No. of transcripts: 268255.
# |Protein data available.

ex_by_tx1 <- exonsBy(txdb, "tx", use.names=TRUE)
ex_by_tx2 <- exonsBy(ensdb, "tx")

length(ex_by_tx1)
# [1] 268255

length(ex_by_tx2)
# [1] 268255

setequal(names(ex_by_tx1), names(ex_by_tx2))
# [1] TRUE

ex_by_tx1[["ENST00000000412"]]
# GRanges object with 7 ranges and 3 metadata columns:
#       seqnames          ranges strand |   exon_id       exon_name exon_rank
#          <Rle>       <IRanges>  <Rle> | <integer>     <character> <integer>
#   [1]       12 8949488-8949645      - |    546622 ENSE00001348389         1
#   [2]       12 8946229-8946405      - |    546616 ENSE00003523177         2
#   [3]       12 8945418-8945584      - |    546610 ENSE00003631241         3
#   [4]       12 8943801-8943910      - |    546604 ENSE00003492441         4
#   [5]       12 8943405-8943535      - |    546601 ENSE00000717490         5
#   [6]       12 8942416-8942542      - |    546596 ENSE00003610229         6
#   [7]       12 8940361-8941940      - |    546588 ENSE00002254457         7
#   -------
#   seqinfo: 1963 sequences (1 circular) from an unspecified genome

ex_by_tx2[["ENST00000000412"]]
# GRanges object with 7 ranges and 2 metadata columns:
#       seqnames          ranges strand |         exon_id exon_rank
#          <Rle>       <IRanges>  <Rle> |     <character> <integer>
#   [1]       12 8949488-8949645      - | ENSE00001348389         1
#   [2]       12 8946229-8946405      - | ENSE00003523177         2
#   [3]       12 8945418-8945584      - | ENSE00003631241         3
#   [4]       12 8943801-8943910      - | ENSE00003492441         4
#   [5]       12 8943405-8943535      - | ENSE00000717490         5
#   [6]       12 8942416-8942542      - | ENSE00003610229         6
#   [7]       12 8940361-8941940      - | ENSE00002254457         7
#   -------
#   seqinfo: 456 sequences from GRCh38 genome

Cheers,

H.

ADD COMMENT
0
Entering edit mode

I was using the proteinToGenome() command of the ensembldb package to be able to discern the genomic position of a premature terminal codon caused by a frameshift variant using HGVSp annotation for the variant. I was unable to find a similar command with GenomicFeatures but maybe I am wrong

ADD REPLY
1
Entering edit mode

Indeed, that's something that GenomicFeatures didn't have so far. Today I added the proteinToGenome() generic + a couple of methods to GenomicFeatures 1.47.10 (BioC devel). Loosely modeled on ensembldb::proteinToGenome(). See ?GenomicFeatures::proteinToGenome for the details.

H.

ADD REPLY

Login before adding your answer.

Traffic: 651 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6