Search
Question: keys() fails on EnsDb.Hsapiens.v75 on Bioc 3.5
0
gravatar for enricoferrero
6 months ago by
enricoferrero550
United Kingdom
enricoferrero550 wrote:

Since upgrading to Bioconductor 3.5 I get this error when running AnnotationDbi::keys() on EnsDb.Hsapiens.v75 or other ensembldb objects:

> library(EnsDb.Hsapiens.v75)
Loading required package: ensembldb
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: AnnotationFilter
Warning messages:
1: multiple methods tables found for ‘rowSums’
2: multiple methods tables found for ‘colSums’
3: multiple methods tables found for ‘rowMeans’
4: multiple methods tables found for ‘colMeans’
> keys(EnsDb.Hsapiens.v75)
Error in validObject(.Object) :
  invalid class "AnnotationFilterList" object: superclass "vectorORfactor" not defined in the environment of the object's class
> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.9 (Santiago)

Matrix products: default
BLAS: /GWD/bioinfo/projects/cb-software/personal/ef884766/lib64/R/lib/libRblas.so
LAPACK: /GWD/bioinfo/projects/cb-software/personal/ef884766/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=C
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=C
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] EnsDb.Hsapiens.v75_2.1.0 ensembldb_2.0.1          AnnotationFilter_1.0.0
 [4] GenomicFeatures_1.28.0   AnnotationDbi_1.38.0     Biobase_2.36.2
 [7] GenomicRanges_1.28.1     GenomeInfoDb_1.12.0      IRanges_2.10.1
[10] S4Vectors_0.14.1         BiocGenerics_0.22.0      nvimcom_0.9-25

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10                  BiocInstaller_1.26.0
 [3] compiler_3.4.0                XVector_0.16.0
 [5] AnnotationHub_2.8.1           ProtGenerics_1.8.0
 [7] bitops_1.0-6                  tools_3.4.0
 [9] zlibbioc_1.22.0               biomaRt_2.32.0
[11] digest_0.6.12                 RSQLite_1.1-2
[13] memoise_1.1.0                 lattice_0.20-35
[15] Matrix_1.2-10                 shiny_1.0.3
[17] DelayedArray_0.2.2            DBI_0.6-1
[19] yaml_2.1.14                   GenomeInfoDbData_0.99.0
[21] rtracklayer_1.36.0            httr_1.2.1
[23] Biostrings_2.44.0             grid_3.4.0
[25] R6_2.2.1                      XML_3.98-1.7
[27] BiocParallel_1.10.1           htmltools_0.3.6
[29] Rsamtools_1.28.0              matrixStats_0.52.2
[31] GenomicAlignments_1.12.1      SummarizedExperiment_1.6.1
[33] xtable_1.8-2                  mime_0.5
[35] interactiveDisplayBase_1.14.0 httpuv_1.3.3
[37] RCurl_1.95-4.8                lazyeval_0.2.0

keys() works fine on other OrgDb objects such as org.Hs.eg.db.

Any fix or workaround for getting the keys out of this object?

Thanks!

P.S.: are ensembldb OrgDb objects ever going to be supported by the Bioconductor Core Team?

 

ADD COMMENTlink modified 6 months ago by Johannes Rainer1.0k • written 6 months ago by enricoferrero550
1
gravatar for Johannes Rainer
6 months ago by
Johannes Rainer1.0k
Italy
Johannes Rainer1.0k wrote:

Hi,

can you please provide the output of sessionInfo()? I have no problem with the keys call. To get the Ensembl gene IDs out of an EnsDb object you can also:

library(EnsDb.Hsapiens.v75)
gene_ids <- genes(EnsDb.Hsapiens.v75, columns = "gene_id")$gene_id

head(gene_ids)
[1] "ENSG00000223972" "ENSG00000227232" "ENSG00000243485" "ENSG00000237613"
[5] "ENSG00000268020" "ENSG00000240361"

Regarding the support of EnsDb by Bioconductor: I'll build for each Ensembl release EnsDb databases for all species in Ensembl. They will then be made available through AnnotationHub:

library(AnnotationHub)
ah <- AnnotationHub()

query(ah, c("EnsDb"))
AnnotationHub with 136 records
# snapshotDate(): 2017-04-25
# $dataprovider: Ensembl
# $species: Ailuropoda Melanoleuca, Anas Platyrhynchos, Anolis Carolinensis,...
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH53185"]]'

            title                                      
  AH53185 | Ensembl 87 EnsDb for Anolis Carolinensis   
  AH53186 | Ensembl 87 EnsDb for Ailuropoda Melanoleuca
  AH53187 | Ensembl 87 EnsDb for Astyanax Mexicanus    
  ...       ...                                        
  AH53754 | Ensembl 88 EnsDb for Vicugna Pacos         
  AH53755 | Ensembl 88 EnsDb for Xiphophorus Maculatus
  AH53756 | Ensembl 88 EnsDb for Xenopus Tropicalis   
ADD COMMENTlink written 6 months ago by Johannes Rainer1.0k

Hi Johannes, thanks for the quick reply. I updated the post with the sessionInfo() output.

Does that mean that the standalone packages are no longer maintained? What's the recommended way to load an EnsDb object for a particular version of Ensembl?

Thanks!

ADD REPLYlink modified 6 months ago • written 6 months ago by enricoferrero550
1

The standalone packages are "stable". But be aware that e.g. EnsDb.Hsapiens.v75 is built on Ensembl release 75, so it will never change and never updated! To get the human gene definitions from Ensembl release 86 you can:

> library(AnnotationHub)
> ah <- AnnotationHub()
snapshotDate(): 2017-04-25

## Search for the according data

> query(ah, c("EnsDb", "87", "Homo sapiens"))
AnnotationHub with 1 record
# snapshotDate(): 2017-04-25
# names(): AH53211
# $dataprovider: Ensembl
# $species: Homo Sapiens
# $rdataclass: EnsDb
# $rdatadateadded: 2017-02-07
# $title: Ensembl 87 EnsDb for Homo Sapiens
# $description: Gene and protein annotations for Homo Sapiens based on Ensem...
# $taxonomyid: 9606
# $genome: GRCh38
# $sourcetype: ensembl
# $sourceurl: http://www.ensembl.org
# $sourcesize: NA
# $tags: c("EnsDb", "Ensembl", "Gene", "Transcript", "Protein",
#   "Annotation", "87", "AHEnsDbs")
# retrieve record with 'object[["AH53211"]]'

## Fetch the resource:
> edb <- ah[["AH53211"]]
downloading from 'https://annotationhub.bioconductor.org/fetch/59949'
retrieving 1 resource
  |======================================================================| 100%
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.2.4
|Creation time: Sat Jan 14 10:20:23 2017
|ensembl_version: 87
|ensembl_host: localhost
|Organism: homo_sapiens
|taxonomy_id: 9606
|genome_build: GRCh38
|DBSCHEMAVERSION: 1.0
| No. of genes: 63970.
| No. of transcripts: 216741.
|Protein data available.
ADD REPLYlink written 6 months ago by Johannes Rainer1.0k

This is cool but, is there a way to grab the EnsDb object without having to look up its name in the AnnotationHub (i.e.:  AH53211)?

BTW, I didn't mean to hijack my own thread : i still get that error when trying to access the EnsDb objects with keys() or genes().

ADD REPLYlink written 6 months ago by enricoferrero550

Yes, you can directly access it with the name - I would expect the ID to be sort of stable, but I'm not sure.

> library(AnnotationHub)
> edb <- AnnotationHub()[["AH53211"]]
snapshotDate(): 2017-04-25
require(“ensembldb”)
loading from cache '/Users/jo//.AnnotationHub/59949'
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.2.4
|Creation time: Sat Jan 14 10:20:23 2017
|ensembl_version: 87
|ensembl_host: localhost
|Organism: homo_sapiens
|taxonomy_id: 9606
|genome_build: GRCh38
|DBSCHEMAVERSION: 1.0
| No. of genes: 63970.
| No. of transcripts: 216741.
|Protein data available.
>

You can do the same with query, as that returns an AnnotationHub resource (instead of specifying the package name like below you could also use separate key words for EnsDb, species and Ensembl version):

> edb <- query(AnnotationHub(), "EnsDb.Hsapiens.v87")[[1]]
snapshotDate(): 2017-04-25
loading from cache '/Users/jo//.AnnotationHub/59949'
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.2.4
|Creation time: Sat Jan 14 10:20:23 2017
|ensembl_version: 87
|ensembl_host: localhost
|Organism: homo_sapiens
|taxonomy_id: 9606
|genome_build: GRCh38
|DBSCHEMAVERSION: 1.0
| No. of genes: 63970.
| No. of transcripts: 216741.
|Protein data available.

 

ADD REPLYlink written 6 months ago by Johannes Rainer1.0k

Also, I get the same error when using genes(EnsDb.Hsapiens.v75, columns = "gene_id")$gene_id

ADD REPLYlink written 6 months ago by enricoferrero550
1

Looks like that error was reported elsewhere too:

https://github.com/joey711/phyloseq/issues/717

Eventually re-install S4Vectors, ensembldb, AnnotationFilter might help.

BiocInstaller::biocLite(c("S4Vectors", "ensembldb", "AnnotationFilter") type="source")
ADD REPLYlink written 6 months ago by Johannes Rainer1.0k
1

Yep, this solves it, thank you. Feel free to add it as an answer so I can accept it.

ADD REPLYlink written 6 months ago by enricoferrero550
0
gravatar for Johannes Rainer
6 months ago by
Johannes Rainer1.0k
Italy
Johannes Rainer1.0k wrote:

Looks like that error was reported elsewhere too:

https://github.com/joey711/phyloseq/issues/717

Eventually re-install S4Vectors, ensembldb, AnnotationFilter might help.

BiocInstaller::biocLite(c("S4Vectors", "ensembldb", "AnnotationFilter") type="source")

 

ADD COMMENTlink written 6 months ago by Johannes Rainer1.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 387 users visited in the last hour