Sure, that information is also available within ensembldb
's EnsDb
databases. Ideally, you should get them from AnnotationHub
as shown in the example below (thus you can get the EnsDb
database for each species for any Ensembl release).
First we're getting the EnsDb
for homo sapiens and Ensembl release 100:
> library(AnnotationHub)
> ah <- AnnotationHub()
snapshotDate(): 2020-10-27
> query(ah, "EnsDb.Hsapiens.v100")
AnnotationHub with 1 record
# snapshotDate(): 2020-10-27
# names(): AH79689
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: EnsDb
# $rdatadateadded: 2020-04-27
# $title: Ensembl 100 EnsDb for Homo sapiens
# $description: Gene and protein annotations for Homo sapiens based on Ensem...
# $taxonomyid: 9606
# $genome: GRCh38
# $sourcetype: ensembl
# $sourceurl: http://www.ensembl.org
# $sourcesize: NA
# $tags: c("100", "AHEnsDbs", "Annotation", "EnsDb", "Ensembl", "Gene",
# "Protein", "Transcript")
# retrieve record with 'object[["AH79689"]]'
> edb <- ah[["AH79689"]]
loading from cache
You can then get gene annotations using the genes
method:
> genes(edb)
GRanges object with 68008 ranges and 8 metadata columns:
seqnames ranges strand | gene_id
<Rle> <IRanges> <Rle> | <character>
ENSG00000223972 1 11869-14409 + | ENSG00000223972
ENSG00000227232 1 14404-29570 - | ENSG00000227232
... ... ... ... . ...
ENSG00000231514 Y 26626520-26627159 - | ENSG00000231514
ENSG00000235857 Y 56855244-56855488 + | ENSG00000235857
gene_name gene_biotype seq_coord_system
<character> <character> <character>
ENSG00000223972 DDX11L1 transcribed_unproces.. chromosome
ENSG00000227232 WASH7P unprocessed_pseudogene chromosome
... ... ... ...
ENSG00000231514 CCNQP2 processed_pseudogene chromosome
ENSG00000235857 CTBP2P1 processed_pseudogene chromosome
description gene_id_version symbol entrezid
<character> <character> <character> <list>
ENSG00000223972 DEAD/H-box helicase .. ENSG00000223972.5 DDX11L1 <NA>
ENSG00000227232 WASP family homolog .. ENSG00000227232.5 WASH7P <NA>
... ... ... ... ...
ENSG00000231514 CCNQ pseudogene 2 [S.. ENSG00000231514.1 CCNQP2 <NA>
ENSG00000235857 CTBP2 pseudogene 1 [.. ENSG00000235857.1 CTBP2P1 <NA>
-------
seqinfo: 454 sequences from GRCh38 genome
The gene description if available in metadata column "description"
. Note also that you could retrieve the results as a data.frame
by setting parameter return.type = "data.frame"
.