Question

AnnotationHub errors when querying

0

Entering edit mode

wewolski ▴ 10

@wewolski-8499

Last seen 2.7 years ago

Zurich

I have to work with ensemble id's and would like to map Gene identifiers to protein identifiers.

So basically without much reading (except of this https://bioconductor.org/packages/release/bioc/vignettes/AnnotationHub/inst/doc/AnnotationHub-HOWTO.html) I am executing the following sequence of commands. Only modification is that instead of orgDB I am querying for ENSEMBL directly - or so I think (reason being that if going with the orgDB example I do only map 60% of ensembl protein ids to ensembl gene IDs).

library(AnnotationHub)
ah = AnnotationHub()

ens <- query(ah, "ENSEMBL")
ens$species
grep("Canis",(unique(ens$species)), value = TRUE)

ensmbl_CLF<- query(ah, c("ENSEMBL",  "Canis lupus familiaris"))

Which seems to work fine till I hit:

 clf <- ensmbl_CLF[[1]]
downloading 1 resources
retrieving 1 resource
  |==========================================================================================================================================================================| 100%

loading from cache
require(“ensembldb”)
Error: failed to load resource
  name: AH67922
  title: Ensembl 95 EnsDb for Canis lupus familiaris
  reason: require(“ensembldb”) failed: use BiocManager::install() to install package?
In addition: Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called ‘ensembldb’

For sure it is pretty clueless what I am doing, so any answer would be helpfull.

annotation • 1.0k views

ADD COMMENT • link updated 4.1 years ago by Guido Hooiveld ★ 4.0k • written 4.1 years ago by wewolski ▴ 10

score 2 · Accepted Answer · 2020-07-03

Please see the last line of the output:

there is no package called ‘ensembldb’

You should install this package first:

BiocManager::install("ensembldb")

Apparently an automatic attempt was made to download and install the package, but apparently that failed....

reason: require(“ensembldb”) failed: use BiocManager::install() to install package?

It will then work, Using your code:

> ensmbl_CLF
AnnotationHub with 32 records
# snapshotDate(): 2020-04-27
# $dataprovider: Ensembl
# $species: canis lupus familiarisgreatdane, canis lupus familiarisbasenji, ...
# $rdataclass: GRanges, TwoBitFile, EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH67922"]]' 

            title                                                              
  AH67922 | Ensembl 95 EnsDb for Canis lupus familiaris                        
  AH69155 | Ensembl 96 EnsDb for Canis lupus familiaris                        
  AH73846 | Ensembl 97 EnsDb for Canis lupus familiaris                        
  AH74973 | Ensembl 98 EnsDb for Canis lupus familiaris                        
  AH78741 | Ensembl 99 EnsDb for Canis lupus familiaris                        
  ...       ...                                                                
  AH82119 | Canis_lupus_familiarisbasenji.Basenji_breed-1.1.ncrna.2bit         
  AH82120 | Canis_lupus_familiarisgreatdane.UMICH_Zoey_3.1.cdna.all.2bit       
  AH82121 | Canis_lupus_familiarisgreatdane.UMICH_Zoey_3.1.dna_rm.toplevel.2bit
  AH82122 | Canis_lupus_familiarisgreatdane.UMICH_Zoey_3.1.dna_sm.toplevel.2bit
  AH82123 | Canis_lupus_familiarisgreatdane.UMICH_Zoey_3.1.ncrna.2bit          
>

For annotation, I would assume you will use the latest available ENSEMBL annotation released for dog (which apparently is EnsDb version 99).

EnsDb.dog <- query(ah, c("EnsDb", "Canis lupus familiaris", "99"))

# fetch the v99 EnsDb and put it in the cache.
EnsDb.dog <- EnsDb.dog[["AH78741"]]

# sample query
k <- keys(EnsDb.dog)[1:10]

# retrieve some annotation info
annotations <- AnnotationDbi:::select(EnsDb.dog, keys = k, keytype = "GENEID",
   columns = c("GENEID", "GENENAME", "DESCRIPTION", "PROTEINID", "UNIPROTID",  "UNIPROTDB"))

head(annotations)
              GENEID GENENAME
1 ENSCAFG00000000001    ENPP1
2 ENSCAFG00000000002         
3 ENSCAFG00000000005   PARD6G
4 ENSCAFG00000000007    ADNP2
5 ENSCAFG00000000008   TXNL4A
6 ENSCAFG00000000008   TXNL4A
                                                                             DESCRIPTION
1 ectonucleotide pyrophosphatase/phosphodiesterase 1 [Source:VGNC Symbol;Acc:VGNC:40374]
2                                                                                   NULL
3         par-6 family cell polarity regulator gamma [Source:VGNC Symbol;Acc:VGNC:53749]
4                                    ADNP homeobox 2 [Source:VGNC Symbol;Acc:VGNC:37663]
5                                thioredoxin like 4A [Source:VGNC Symbol;Acc:VGNC:48019]
6                                thioredoxin like 4A [Source:VGNC Symbol;Acc:VGNC:48019]
           PROTEINID UNIPROTID UNIPROTDB
1 ENSCAFP00000000001    F1PJP0  SPTREMBL
2 ENSCAFP00000041865    J9NT13  SPTREMBL
3 ENSCAFP00000000006    F1PJN8  SPTREMBL
4 ENSCAFP00000000007    F1PJN7  SPTREMBL
5 ENSCAFP00000050103      <NA>      <NA>
6 ENSCAFP00000000008    E2R204  SPTREMBL
>

Lastly, if you would like to install and save the EnsDb to hafve a local copy, check this recent thread here.