Missing SYMBOL keytype in EnsDb.Hsapiens.v75
1
0
Entering edit mode
Zach Roe ▴ 10
@zach-roe-11189
Last seen 4.5 years ago

Hi,

I'm following the AnnotationDbi Introduction pdf on the use of the EnsDb.Hsapiens.v75 database as I have gene symbols that need to be mapped back to ENSEMBL ID's.  (They were originally ENSEMBL ID's that were mapped to gene symbol by a prior scientist, but that mapping is lost to me so I want to use a ENSEMBL specific data base to see if I can recover the original ID's).

Please refer to Section 0.6 of this July 7, 2016 published manual :

https://www.bioconductor.org/packages/devel/bioc/vignettes/AnnotationDbi/inst/doc/IntroToAnnotationPackages.pdf

 

However, when I use SYMBOL as keytype to map to GENEID column, I received the following error:

> mapIds(EnsDb.Hsapiens.v75, keys=keys, column="GENEID", keytype="SYMBOL", multiVals="first")

Error in .select(x = x, keys = keys, columns = columns, keytype = keytype,  : 
  keytype SYMBOL not available in the database. Use keytypes method to list all available keytypes.
In addition: Warning message:
In .select(x = x, keys = keys, columns = columns, keytype = keytype,  :
  The following columns are not available in the database and have thus been removed: SYMBOL

A check of available keytypes and columns per the manual shows SYMBOL is no longer available, contrary to the example shown in Section 0.6

> library(EnsDb.Hsapiens.v75)
> edb <- EnsDb.Hsapiens.v75
> columns(edb)
 [1] "ENTREZID"       "EXONID"         "EXONIDX"        "EXONSEQEND"     "EXONSEQSTART"   "GENEBIOTYPE"   
 [7] "GENEID"         "GENENAME"       "GENESEQEND"     "GENESEQSTART"   "ISCIRCULAR"     "SEQCOORDSYSTEM"
[13] "SEQLENGTH"      "SEQNAME"        "SEQSTRAND"      "TXBIOTYPE"      "TXCDSSEQEND"    "TXCDSSEQSTART" 
[19] "TXID"           "TXSEQEND"       "TXSEQSTART"    
> keytypes(edb)
[1] "ENTREZID"    "EXONID"      "GENEBIOTYPE" "GENEID"      "GENENAME"    "SEQNAME"     "SEQSTRAND"   "TXBIOTYPE"  
[9] "TXID"

Session info is below.

Thank you very much!

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
 [1] parallel  stats4    grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] EnsDb.Hsapiens.v75_0.99.12 ensembldb_1.4.7            GenomicFeatures_1.24.5     gageData_2.10.0           
 [5] gage_2.22.0                pathview_1.12.0            vsn_3.40.0                 BiocParallel_1.6.3        
 [9] arrayQualityMetrics_3.28.2 HGNChelper_0.3.1           DESeq2_1.12.3              SummarizedExperiment_1.2.3
[13] GenomicRanges_1.24.2       GenomeInfoDb_1.8.3         org.Hs.eg.db_3.3.0         AnnotationDbi_1.34.4      
[17] IRanges_2.6.1              S4Vectors_0.10.2           Biobase_2.32.0             BiocGenerics_0.18.0       
[21] calibrate_1.7.2            MASS_7.3-45                xlsx_0.5.7                 xlsxjars_0.6.1            
[25] rJava_0.9-8                gridExtra_2.2.1            ggplot2_2.1.0              pheatmap_1.0.8            
[29] lattice_0.20-33            RColorBrewer_1.1-2         gplots_3.0.1               reshape2_1.4.1            
[33] reshape_0.8.5              tidyr_0.5.1                dplyr_0.5.0                circlize_0.3.7            
[37] migest_1.7.2               BiocInstaller_1.22.3  
annotationdbi EnsDb.Hsapiens.v75 • 2.1k views
ADD COMMENT
2
Entering edit mode
Johannes Rainer ★ 2.1k
@johannes-rainer-6987
Last seen 15 days ago
Italy

Hi

it is not that the column/keytype SYMBOL is no longer there, but it is not yet there. I've included support for the SYMBOL filter in version 1.5.9 of ensembldb (current version in Bioc devel).

SYMBOL is however only a symlink to *GENENAME*, so in your case you could easily use keytype = "GENENAME" instead and you would get the same results:

> library(ensembldb)
> library(EnsDb.Hsapiens.v75)
> keys <- c("BCL2", "BCL2L11", "ZBTB16", "NR3C1")
> mapIds(EnsDb.Hsapiens.v75, keys = keys, column = "GENEID",
+        keytype = "GENENAME", multiVals = "first")
             BCL2           BCL2L11            ZBTB16             NR3C1
"ENSG00000171791" "ENSG00000153094" "ENSG00000109906" "ENSG00000113580"
ADD COMMENT

Login before adding your answer.

Traffic: 686 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6