AnnotationHub errors when querying
1
0
Entering edit mode
wewolski ▴ 10
@wewolski-8499
Last seen 2.3 years ago
Zurich

I have to work with ensemble id's and would like to map Gene identifiers to protein identifiers.

So basically without much reading (except of this https://bioconductor.org/packages/release/bioc/vignettes/AnnotationHub/inst/doc/AnnotationHub-HOWTO.html) I am executing the following sequence of commands. Only modification is that instead of orgDB I am querying for ENSEMBL directly - or so I think (reason being that if going with the orgDB example I do only map 60% of ensembl protein ids to ensembl gene IDs).

library(AnnotationHub)
ah = AnnotationHub()

ens <- query(ah, "ENSEMBL")
ens$species
grep("Canis",(unique(ens$species)), value = TRUE)

ensmbl_CLF<- query(ah, c("ENSEMBL",  "Canis lupus familiaris"))

Which seems to work fine till I hit:

 clf <- ensmbl_CLF[[1]]
downloading 1 resources
retrieving 1 resource
  |==========================================================================================================================================================================| 100%

loading from cache
require(“ensembldb”)
Error: failed to load resource
  name: AH67922
  title: Ensembl 95 EnsDb for Canis lupus familiaris
  reason: require(“ensembldb”) failed: use BiocManager::install() to install package?
In addition: Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called ‘ensembldb’

For sure it is pretty clueless what I am doing, so any answer would be helpfull.

annotation • 881 views
ADD COMMENT
2
Entering edit mode
Guido Hooiveld ★ 3.9k
@guido-hooiveld-2020
Last seen 32 minutes ago
Wageningen University, Wageningen, the …

Please see the last line of the output:

there is no package called ‘ensembldb’

You should install this package first:

BiocManager::install("ensembldb")

Apparently an automatic attempt was made to download and install the package, but apparently that failed....

reason: require(“ensembldb”) failed: use BiocManager::install() to install package?

It will then work, Using your code:

> ensmbl_CLF
AnnotationHub with 32 records
# snapshotDate(): 2020-04-27
# $dataprovider: Ensembl
# $species: canis lupus familiarisgreatdane, canis lupus familiarisbasenji, ...
# $rdataclass: GRanges, TwoBitFile, EnsDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH67922"]]' 

            title                                                              
  AH67922 | Ensembl 95 EnsDb for Canis lupus familiaris                        
  AH69155 | Ensembl 96 EnsDb for Canis lupus familiaris                        
  AH73846 | Ensembl 97 EnsDb for Canis lupus familiaris                        
  AH74973 | Ensembl 98 EnsDb for Canis lupus familiaris                        
  AH78741 | Ensembl 99 EnsDb for Canis lupus familiaris                        
  ...       ...                                                                
  AH82119 | Canis_lupus_familiarisbasenji.Basenji_breed-1.1.ncrna.2bit         
  AH82120 | Canis_lupus_familiarisgreatdane.UMICH_Zoey_3.1.cdna.all.2bit       
  AH82121 | Canis_lupus_familiarisgreatdane.UMICH_Zoey_3.1.dna_rm.toplevel.2bit
  AH82122 | Canis_lupus_familiarisgreatdane.UMICH_Zoey_3.1.dna_sm.toplevel.2bit
  AH82123 | Canis_lupus_familiarisgreatdane.UMICH_Zoey_3.1.ncrna.2bit          
>

For annotation, I would assume you will use the latest available ENSEMBL annotation released for dog (which apparently is EnsDb version 99).

EnsDb.dog <- query(ah, c("EnsDb", "Canis lupus familiaris", "99"))

# fetch the v99 EnsDb and put it in the cache.
EnsDb.dog <- EnsDb.dog[["AH78741"]]

# sample query
k <- keys(EnsDb.dog)[1:10]

# retrieve some annotation info
annotations <- AnnotationDbi:::select(EnsDb.dog, keys = k, keytype = "GENEID",
   columns = c("GENEID", "GENENAME", "DESCRIPTION", "PROTEINID", "UNIPROTID",  "UNIPROTDB"))

head(annotations)
              GENEID GENENAME
1 ENSCAFG00000000001    ENPP1
2 ENSCAFG00000000002         
3 ENSCAFG00000000005   PARD6G
4 ENSCAFG00000000007    ADNP2
5 ENSCAFG00000000008   TXNL4A
6 ENSCAFG00000000008   TXNL4A
                                                                             DESCRIPTION
1 ectonucleotide pyrophosphatase/phosphodiesterase 1 [Source:VGNC Symbol;Acc:VGNC:40374]
2                                                                                   NULL
3         par-6 family cell polarity regulator gamma [Source:VGNC Symbol;Acc:VGNC:53749]
4                                    ADNP homeobox 2 [Source:VGNC Symbol;Acc:VGNC:37663]
5                                thioredoxin like 4A [Source:VGNC Symbol;Acc:VGNC:48019]
6                                thioredoxin like 4A [Source:VGNC Symbol;Acc:VGNC:48019]
           PROTEINID UNIPROTID UNIPROTDB
1 ENSCAFP00000000001    F1PJP0  SPTREMBL
2 ENSCAFP00000041865    J9NT13  SPTREMBL
3 ENSCAFP00000000006    F1PJN8  SPTREMBL
4 ENSCAFP00000000007    F1PJN7  SPTREMBL
5 ENSCAFP00000050103      <NA>      <NA>
6 ENSCAFP00000000008    E2R204  SPTREMBL
>

Lastly, if you would like to install and save the EnsDb to hafve a local copy, check this recent thread here.

ADD COMMENT

Login before adding your answer.

Traffic: 831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6