keys() fails on EnsDb.Hsapiens.v75 on Bioc 3.5
2
0
Entering edit mode
enricoferrero ▴ 660
@enricoferrero-6037
Last seen 3.0 years ago
Switzerland

Since upgrading to Bioconductor 3.5 I get this error when running AnnotationDbi::keys() on EnsDb.Hsapiens.v75 or other ensembldb objects:

> library(EnsDb.Hsapiens.v75)
Loading required package: ensembldb
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: AnnotationFilter
Warning messages:
1: multiple methods tables found for ‘rowSums’
2: multiple methods tables found for ‘colSums’
3: multiple methods tables found for ‘rowMeans’
4: multiple methods tables found for ‘colMeans’
> keys(EnsDb.Hsapiens.v75)
Error in validObject(.Object) :
  invalid class "AnnotationFilterList" object: superclass "vectorORfactor" not defined in the environment of the object's class
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.9 (Santiago)

Matrix products: default
BLAS: /GWD/bioinfo/projects/cb-software/personal/ef884766/lib64/R/lib/libRblas.so
LAPACK: /GWD/bioinfo/projects/cb-software/personal/ef884766/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=C
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=C
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] EnsDb.Hsapiens.v75_2.1.0 ensembldb_2.0.1          AnnotationFilter_1.0.0
 [4] GenomicFeatures_1.28.0   AnnotationDbi_1.38.0     Biobase_2.36.2
 [7] GenomicRanges_1.28.1     GenomeInfoDb_1.12.0      IRanges_2.10.1
[10] S4Vectors_0.14.1         BiocGenerics_0.22.0      nvimcom_0.9-25

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10                  BiocInstaller_1.26.0
 [3] compiler_3.4.0                XVector_0.16.0
 [5] AnnotationHub_2.8.1           ProtGenerics_1.8.0
 [7] bitops_1.0-6                  tools_3.4.0
 [9] zlibbioc_1.22.0               biomaRt_2.32.0
[11] digest_0.6.12                 RSQLite_1.1-2
[13] memoise_1.1.0                 lattice_0.20-35
[15] Matrix_1.2-10                 shiny_1.0.3
[17] DelayedArray_0.2.2            DBI_0.6-1
[19] yaml_2.1.14                   GenomeInfoDbData_0.99.0
[21] rtracklayer_1.36.0            httr_1.2.1
[23] Biostrings_2.44.0             grid_3.4.0
[25] R6_2.2.1                      XML_3.98-1.7
[27] BiocParallel_1.10.1           htmltools_0.3.6
[29] Rsamtools_1.28.0              matrixStats_0.52.2
[31] GenomicAlignments_1.12.1      SummarizedExperiment_1.6.1
[33] xtable_1.8-2                  mime_0.5
[35] interactiveDisplayBase_1.14.0 httpuv_1.3.3
[37] RCurl_1.95-4.8                lazyeval_0.2.0

keys() works fine on other OrgDb objects such as org.Hs.eg.db.

Any fix or workaround for getting the keys out of this object?

Thanks!

P.S.: are ensembldb OrgDb objects ever going to be supported by the Bioconductor Core Team?

 

ensdb.hsapiens.v75 ensembldb keys • 2.4k views
ADD COMMENT
1
Entering edit mode
Johannes Rainer ★ 2.1k
@johannes-rainer-6987
Last seen 28 days ago
Italy

Hi,

can you please provide the output of sessionInfo()? I have no problem with the keys call. To get the Ensembl gene IDs out of an EnsDb object you can also:

library(EnsDb.Hsapiens.v75)
gene_ids <- genes(EnsDb.Hsapiens.v75, columns = "gene_id")$gene_id

head(gene_ids)
[1] "ENSG00000223972" "ENSG00000227232" "ENSG00000243485" "ENSG00000237613"
[5] "ENSG00000268020" "ENSG00000240361"

Regarding the support of EnsDb by Bioconductor: I'll build for each Ensembl release EnsDb databases for all species in Ensembl. They will then be made available through AnnotationHub:

library(AnnotationHub)
ah <- AnnotationHub()

query(ah, c("EnsDb"))
AnnotationHub with 136 records
# snapshotDate(): 2017-04-25
# $dataprovider: Ensembl
# $species: Ailuropoda Melanoleuca, Anas Platyrhynchos, Anolis Carolinensis,...
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH53185"]]'

            title                                      
  AH53185 | Ensembl 87 EnsDb for Anolis Carolinensis   
  AH53186 | Ensembl 87 EnsDb for Ailuropoda Melanoleuca
  AH53187 | Ensembl 87 EnsDb for Astyanax Mexicanus    
  ...       ...                                        
  AH53754 | Ensembl 88 EnsDb for Vicugna Pacos         
  AH53755 | Ensembl 88 EnsDb for Xiphophorus Maculatus
  AH53756 | Ensembl 88 EnsDb for Xenopus Tropicalis   
ADD COMMENT
0
Entering edit mode

Hi Johannes, thanks for the quick reply. I updated the post with the sessionInfo() output.

Does that mean that the standalone packages are no longer maintained? What's the recommended way to load an EnsDb object for a particular version of Ensembl?

Thanks!

ADD REPLY
1
Entering edit mode

The standalone packages are "stable". But be aware that e.g. EnsDb.Hsapiens.v75 is built on Ensembl release 75, so it will never change and never updated! To get the human gene definitions from Ensembl release 86 you can:

> library(AnnotationHub)
> ah <- AnnotationHub()
snapshotDate(): 2017-04-25

## Search for the according data

> query(ah, c("EnsDb", "87", "Homo sapiens"))
AnnotationHub with 1 record
# snapshotDate(): 2017-04-25
# names(): AH53211
# $dataprovider: Ensembl
# $species: Homo Sapiens
# $rdataclass: EnsDb
# $rdatadateadded: 2017-02-07
# $title: Ensembl 87 EnsDb for Homo Sapiens
# $description: Gene and protein annotations for Homo Sapiens based on Ensem...
# $taxonomyid: 9606
# $genome: GRCh38
# $sourcetype: ensembl
# $sourceurl: http://www.ensembl.org
# $sourcesize: NA
# $tags: c("EnsDb", "Ensembl", "Gene", "Transcript", "Protein",
#   "Annotation", "87", "AHEnsDbs")
# retrieve record with 'object[["AH53211"]]'

## Fetch the resource:
> edb <- ah[["AH53211"]]
downloading from 'https://annotationhub.bioconductor.org/fetch/59949'
retrieving 1 resource
  |======================================================================| 100%
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.2.4
|Creation time: Sat Jan 14 10:20:23 2017
|ensembl_version: 87
|ensembl_host: localhost
|Organism: homo_sapiens
|taxonomy_id: 9606
|genome_build: GRCh38
|DBSCHEMAVERSION: 1.0
| No. of genes: 63970.
| No. of transcripts: 216741.
|Protein data available.
ADD REPLY
0
Entering edit mode

This is cool but, is there a way to grab the EnsDb object without having to look up its name in the AnnotationHub (i.e.:  AH53211)?

BTW, I didn't mean to hijack my own thread : i still get that error when trying to access the EnsDb objects with keys() or genes().

ADD REPLY
0
Entering edit mode

Yes, you can directly access it with the name - I would expect the ID to be sort of stable, but I'm not sure.

> library(AnnotationHub)
> edb <- AnnotationHub()[["AH53211"]]
snapshotDate(): 2017-04-25
require(“ensembldb”)
loading from cache '/Users/jo//.AnnotationHub/59949'
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.2.4
|Creation time: Sat Jan 14 10:20:23 2017
|ensembl_version: 87
|ensembl_host: localhost
|Organism: homo_sapiens
|taxonomy_id: 9606
|genome_build: GRCh38
|DBSCHEMAVERSION: 1.0
| No. of genes: 63970.
| No. of transcripts: 216741.
|Protein data available.
>

You can do the same with query, as that returns an AnnotationHub resource (instead of specifying the package name like below you could also use separate key words for EnsDb, species and Ensembl version):

> edb <- query(AnnotationHub(), "EnsDb.Hsapiens.v87")[[1]]
snapshotDate(): 2017-04-25
loading from cache '/Users/jo//.AnnotationHub/59949'
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.2.4
|Creation time: Sat Jan 14 10:20:23 2017
|ensembl_version: 87
|ensembl_host: localhost
|Organism: homo_sapiens
|taxonomy_id: 9606
|genome_build: GRCh38
|DBSCHEMAVERSION: 1.0
| No. of genes: 63970.
| No. of transcripts: 216741.
|Protein data available.

 

ADD REPLY
0
Entering edit mode

Also, I get the same error when using genes(EnsDb.Hsapiens.v75, columns = "gene_id")$gene_id

ADD REPLY
1
Entering edit mode

Looks like that error was reported elsewhere too:

https://github.com/joey711/phyloseq/issues/717

Eventually re-install S4Vectors, ensembldb, AnnotationFilter might help.

BiocInstaller::biocLite(c("S4Vectors", "ensembldb", "AnnotationFilter") type="source")
ADD REPLY
1
Entering edit mode

Yep, this solves it, thank you. Feel free to add it as an answer so I can accept it.

ADD REPLY
0
Entering edit mode
Johannes Rainer ★ 2.1k
@johannes-rainer-6987
Last seen 28 days ago
Italy

Looks like that error was reported elsewhere too:

https://github.com/joey711/phyloseq/issues/717

Eventually re-install S4Vectors, ensembldb, AnnotationFilter might help.

BiocInstaller::biocLite(c("S4Vectors", "ensembldb", "AnnotationFilter") type="source")

 

ADD COMMENT

Login before adding your answer.

Traffic: 943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6