Question

keys() fails on EnsDb.Hsapiens.v75 on Bioc 3.5

0

Entering edit mode

enricoferrero ▴ 680

@enricoferrero-6037

Last seen 4.3 years ago

Switzerland

Since upgrading to Bioconductor 3.5 I get this error when running AnnotationDbi::keys() on EnsDb.Hsapiens.v75 or other ensembldb objects:

> library(EnsDb.Hsapiens.v75)
Loading required package: ensembldb
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: AnnotationFilter
Warning messages:
1: multiple methods tables found for ‘rowSums’
2: multiple methods tables found for ‘colSums’
3: multiple methods tables found for ‘rowMeans’
4: multiple methods tables found for ‘colMeans’
> keys(EnsDb.Hsapiens.v75)
Error in validObject(.Object) :
  invalid class "AnnotationFilterList" object: superclass "vectorORfactor" not defined in the environment of the object's class
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.9 (Santiago)

Matrix products: default
BLAS: /GWD/bioinfo/projects/cb-software/personal/ef884766/lib64/R/lib/libRblas.so
LAPACK: /GWD/bioinfo/projects/cb-software/personal/ef884766/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=C
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=C
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] EnsDb.Hsapiens.v75_2.1.0 ensembldb_2.0.1          AnnotationFilter_1.0.0
 [4] GenomicFeatures_1.28.0   AnnotationDbi_1.38.0     Biobase_2.36.2
 [7] GenomicRanges_1.28.1     GenomeInfoDb_1.12.0      IRanges_2.10.1
[10] S4Vectors_0.14.1         BiocGenerics_0.22.0      nvimcom_0.9-25

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10                  BiocInstaller_1.26.0
 [3] compiler_3.4.0                XVector_0.16.0
 [5] AnnotationHub_2.8.1           ProtGenerics_1.8.0
 [7] bitops_1.0-6                  tools_3.4.0
 [9] zlibbioc_1.22.0               biomaRt_2.32.0
[11] digest_0.6.12                 RSQLite_1.1-2
[13] memoise_1.1.0                 lattice_0.20-35
[15] Matrix_1.2-10                 shiny_1.0.3
[17] DelayedArray_0.2.2            DBI_0.6-1
[19] yaml_2.1.14                   GenomeInfoDbData_0.99.0
[21] rtracklayer_1.36.0            httr_1.2.1
[23] Biostrings_2.44.0             grid_3.4.0
[25] R6_2.2.1                      XML_3.98-1.7
[27] BiocParallel_1.10.1           htmltools_0.3.6
[29] Rsamtools_1.28.0              matrixStats_0.52.2
[31] GenomicAlignments_1.12.1      SummarizedExperiment_1.6.1
[33] xtable_1.8-2                  mime_0.5
[35] interactiveDisplayBase_1.14.0 httpuv_1.3.3
[37] RCurl_1.95-4.8                lazyeval_0.2.0

keys() works fine on other OrgDb objects such as org.Hs.eg.db.

Any fix or workaround for getting the keys out of this object?

Thanks!

P.S.: are ensembldb OrgDb objects ever going to be supported by the Bioconductor Core Team?

ensdb.hsapiens.v75 ensembldb keys • 3.1k views

ADD COMMENT • link updated 8.8 years ago by Johannes Rainer ★ 2.1k • written 8.8 years ago by enricoferrero ▴ 680

score 1 · Answer 1 · 2017-05-15

1

Entering edit mode

Johannes Rainer ★ 2.1k

@johannes-rainer-6987

Last seen 16 months ago

Italy

Hi,

can you please provide the output of sessionInfo()? I have no problem with the keys call. To get the Ensembl gene IDs out of an EnsDb object you can also:

library(EnsDb.Hsapiens.v75)
gene_ids <- genes(EnsDb.Hsapiens.v75, columns = "gene_id")$gene_id

head(gene_ids)
[1] "ENSG00000223972" "ENSG00000227232" "ENSG00000243485" "ENSG00000237613"
[5] "ENSG00000268020" "ENSG00000240361"

Regarding the support of EnsDb by Bioconductor: I'll build for each Ensembl release EnsDb databases for all species in Ensembl. They will then be made available through AnnotationHub:

library(AnnotationHub)
ah <- AnnotationHub()

query(ah, c("EnsDb"))
AnnotationHub with 136 records
# snapshotDate(): 2017-04-25
# $dataprovider: Ensembl
# $species: Ailuropoda Melanoleuca, Anas Platyrhynchos, Anolis Carolinensis,...
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH53185"]]'

            title                                      
  AH53185 | Ensembl 87 EnsDb for Anolis Carolinensis   
  AH53186 | Ensembl 87 EnsDb for Ailuropoda Melanoleuca
  AH53187 | Ensembl 87 EnsDb for Astyanax Mexicanus    
  ...       ...                                        
  AH53754 | Ensembl 88 EnsDb for Vicugna Pacos         
  AH53755 | Ensembl 88 EnsDb for Xiphophorus Maculatus
  AH53756 | Ensembl 88 EnsDb for Xenopus Tropicalis

ADD COMMENT • link 8.8 years ago Johannes Rainer ★ 2.1k

0

Entering edit mode

Hi Johannes, thanks for the quick reply. I updated the post with the sessionInfo() output.

Does that mean that the standalone packages are no longer maintained? What's the recommended way to load an EnsDb object for a particular version of Ensembl?

Thanks!

ADD REPLY • link 8.8 years ago enricoferrero ▴ 680

1

Entering edit mode

The standalone packages are "stable". But be aware that e.g. EnsDb.Hsapiens.v75 is built on Ensembl release 75, so it will never change and never updated! To get the human gene definitions from Ensembl release 86 you can:

> library(AnnotationHub)
> ah <- AnnotationHub()
snapshotDate(): 2017-04-25

## Search for the according data

> query(ah, c("EnsDb", "87", "Homo sapiens"))
AnnotationHub with 1 record
# snapshotDate(): 2017-04-25
# names(): AH53211
# $dataprovider: Ensembl
# $species: Homo Sapiens
# $rdataclass: EnsDb
# $rdatadateadded: 2017-02-07
# $title: Ensembl 87 EnsDb for Homo Sapiens
# $description: Gene and protein annotations for Homo Sapiens based on Ensem...
# $taxonomyid: 9606
# $genome: GRCh38
# $sourcetype: ensembl
# $sourceurl: http://www.ensembl.org
# $sourcesize: NA
# $tags: c("EnsDb", "Ensembl", "Gene", "Transcript", "Protein",
#   "Annotation", "87", "AHEnsDbs")
# retrieve record with 'object[["AH53211"]]'

## Fetch the resource:
> edb <- ah[["AH53211"]]
downloading from 'https://annotationhub.bioconductor.org/fetch/59949'
retrieving 1 resource
  |======================================================================| 100%
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.2.4
|Creation time: Sat Jan 14 10:20:23 2017
|ensembl_version: 87
|ensembl_host: localhost
|Organism: homo_sapiens
|taxonomy_id: 9606
|genome_build: GRCh38
|DBSCHEMAVERSION: 1.0
| No. of genes: 63970.
| No. of transcripts: 216741.
|Protein data available.

ADD REPLY • link 8.8 years ago Johannes Rainer ★ 2.1k

0

Entering edit mode

This is cool but, is there a way to grab the EnsDb object without having to look up its name in the AnnotationHub (i.e.: AH53211)?

BTW, I didn't mean to hijack my own thread : i still get that error when trying to access the EnsDb objects with keys() or genes().

ADD REPLY • link 8.8 years ago enricoferrero ▴ 680

0

Entering edit mode

Yes, you can directly access it with the name - I would expect the ID to be sort of stable, but I'm not sure.

> library(AnnotationHub)
> edb <- AnnotationHub()[["AH53211"]]
snapshotDate(): 2017-04-25
require(“ensembldb”)
loading from cache '/Users/jo//.AnnotationHub/59949'
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.2.4
|Creation time: Sat Jan 14 10:20:23 2017
|ensembl_version: 87
|ensembl_host: localhost
|Organism: homo_sapiens
|taxonomy_id: 9606
|genome_build: GRCh38
|DBSCHEMAVERSION: 1.0
| No. of genes: 63970.
| No. of transcripts: 216741.
|Protein data available.
>

You can do the same with query, as that returns an AnnotationHub resource (instead of specifying the package name like below you could also use separate key words for EnsDb, species and Ensembl version):

> edb <- query(AnnotationHub(), "EnsDb.Hsapiens.v87")[[1]]
snapshotDate(): 2017-04-25
loading from cache '/Users/jo//.AnnotationHub/59949'
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.2.4
|Creation time: Sat Jan 14 10:20:23 2017
|ensembl_version: 87
|ensembl_host: localhost
|Organism: homo_sapiens
|taxonomy_id: 9606
|genome_build: GRCh38
|DBSCHEMAVERSION: 1.0
| No. of genes: 63970.
| No. of transcripts: 216741.
|Protein data available.

ADD REPLY • link 8.8 years ago Johannes Rainer ★ 2.1k

0

Entering edit mode

Also, I get the same error when using genes(EnsDb.Hsapiens.v75, columns = "gene_id")$gene_id

ADD REPLY • link 8.8 years ago enricoferrero ▴ 680

1

Entering edit mode

Looks like that error was reported elsewhere too:

https://github.com/joey711/phyloseq/issues/717

Eventually re-install S4Vectors, ensembldb, AnnotationFilter might help.

BiocInstaller::biocLite(c("S4Vectors", "ensembldb", "AnnotationFilter") type="source")

ADD REPLY • link 8.8 years ago Johannes Rainer ★ 2.1k

1

Entering edit mode

Yep, this solves it, thank you. Feel free to add it as an answer so I can accept it.

ADD REPLY • link 8.8 years ago enricoferrero ▴ 680

score 0 · Answer 2 · 2017-05-15

0

Entering edit mode

Johannes Rainer ★ 2.1k

@johannes-rainer-6987

Last seen 16 months ago

Italy

Looks like that error was reported elsewhere too:

https://github.com/joey711/phyloseq/issues/717

Eventually re-install S4Vectors, ensembldb, AnnotationFilter might help.

BiocInstaller::biocLite(c("S4Vectors", "ensembldb", "AnnotationFilter") type="source")

ADD COMMENT • link 8.8 years ago Johannes Rainer ★ 2.1k