Is it possible to get Ensembl gene descriptions with ensembldb?
3
0
Entering edit mode
Julien Roux ▴ 90
@julien-roux-2710
Last seen 4.9 years ago
Switzerland/Basel/University of Basel

Dear ensembldb maintainers,

First, thanks a lot for this great resource!

I have a question regarding the "gene description" field. It seems to me that it is not accessible from ensembldb, at least I couldn't find it in the available "columns"... For example, for the gene "Cntnap1" I would like to retrieve the following field: "contactin associated protein-like 1 [Source:MGI Symbol;Acc:MGI:1858201]" (see http://www.ensembl.org/Mus_musculus/Gene/Summary?db=core;g=ENSMUSG00000017167)

Below is what I tried, am I missing something?

library(AnnotationHub)
ah <- AnnotationHub()
ahDb <- query(ah, "Ensembl 88 EnsDb for Mus musculus")
edb <- ahDb[[1]]
columns(edb)

[1] "ENTREZID"            "EXONID"              "EXONIDX"             "EXONSEQEND"          "EXONSEQSTART"        "GENEBIOTYPE"       
[7] "GENEID"              "GENENAME"            "GENESEQEND"          "GENESEQSTART"        "INTERPROACCESSION"   "ISCIRCULAR"        
[13] "PROTDOMEND"          "PROTDOMSTART"        "PROTEINDOMAINID"     "PROTEINDOMAINSOURCE" "PROTEINID"           "PROTEINSEQUENCE"   
[19] "SEQCOORDSYSTEM"      "SEQLENGTH"           "SEQNAME"             "SEQSTRAND"           "SYMBOL"              "TXBIOTYPE"         
[25] "TXCDSSEQEND"         "TXCDSSEQSTART"       "TXID"                "TXNAME"              "TXSEQEND"            "TXSEQSTART"        
[31] "UNIPROTDB"           "UNIPROTID"           "UNIPROTMAPPINGTYPE"

Here is my sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base     

other attached packages:
[1] AnnotationHub_2.8.1 BiocGenerics_0.22.0

loaded via a namespace (and not attached):
[1] Rcpp_0.12.11                  IRanges_2.10.2              
[3] digest_0.6.12                 mime_0.5                    
[5] R6_2.2.1                      xtable_1.8-2                
[7] DBI_0.6-1                     stats4_3.4.0                
[9] RSQLite_1.1-2                 BiocInstaller_1.26.0        
[11] httr_1.2.1                    S4Vectors_0.14.2            
[13] Biobase_2.36.2                shiny_1.0.3                 
[15] httpuv_1.3.3                  yaml_2.1.14                 
[17] compiler_3.4.0                AnnotationDbi_1.38.0        
[19] memoise_1.1.0                 htmltools_0.3.6             
[21] interactiveDisplayBase_1.14.0

 

ensembldb • 1.7k views
ADD COMMENT
3
Entering edit mode
Johannes Rainer ★ 2.0k
@johannes-rainer-6987
Last seen 24 days ago
Italy

OK, it's now available for Ensembl 87 (still working on the Ensembl 88) that you can get from AnnotationHub:

> library(AnnotationHub)
> ah <- AnnotationHub()
> edb <- query(ah, "EnsDb.Hsapiens.v87")
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.3.1
|Creation time: Thu Jun  8 07:27:44 2017
|ensembl_version: 87
|ensembl_host: localhost
|Organism: homo_sapiens
|taxonomy_id: 9606
|genome_build: GRCh38
|DBSCHEMAVERSION: 2.1
| No. of genes: 63970.
| No. of transcripts: 216741.
|Protein data available.

> genes(edb, columns = c("gene_id", "description"), return.type = "DataFrame")
DataFrame with 63970 rows and 2 columns
              gene_id
          <character>
1     ENSG00000000003
2     ENSG00000000005
3     ENSG00000000419
...               ...
63968          LRG_98
63969          LRG_99
63970         LRG_992
                                                                                         description
                                                                                         <character>
1                                                  tetraspanin 6 [Source:HGNC Symbol;Acc:HGNC:11858]
2                                                    tenomodulin [Source:HGNC Symbol;Acc:HGNC:17757]
3     dolichyl-phosphate mannosyltransferase subunit 1, catalytic [Source:HGNC Symbol;Acc:HGNC:3005]
...                                                                                              ...
63968                                  recombination activating 1 [Source:HGNC Symbol;Acc:HGNC:9831]
63969                                  recombination activating 2 [Source:HGNC Symbol;Acc:HGNC:9832]
63970                                         estrogen receptor 1 [Source:HGNC Symbol;Acc:HGNC:3467]

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin16.6.0/x86_64 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] ensembldb_2.0.3        AnnotationFilter_1.0.0 GenomicFeatures_1.28.2
 [4] AnnotationDbi_1.38.1   Biobase_2.36.2         GenomicRanges_1.28.3  
 [7] GenomeInfoDb_1.12.1    IRanges_2.10.2         S4Vectors_0.14.3      
[10] AnnotationHub_2.8.1    BiocGenerics_0.22.0   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11                  BiocInstaller_1.26.0         
 [3] compiler_3.4.0                XVector_0.16.0               
 [5] ProtGenerics_1.8.0            bitops_1.0-6                 
 [7] tools_3.4.0                   zlibbioc_1.22.0              
 [9] biomaRt_2.32.0                digest_0.6.12                
[11] lattice_0.20-35               RSQLite_1.1-2                
[13] memoise_1.1.0                 Matrix_1.2-10                
[15] DelayedArray_0.2.7            shiny_1.0.3                  
[17] DBI_0.6-1                     curl_2.6                     
[19] yaml_2.1.14                   GenomeInfoDbData_0.99.0      
[21] rtracklayer_1.36.3            httr_1.2.1                   
[23] Biostrings_2.44.1             grid_3.4.0                   
[25] R6_2.2.1                      XML_3.98-1.7                 
[27] BiocParallel_1.10.1           matrixStats_0.52.2           
[29] Rsamtools_1.28.0              htmltools_0.3.6              
[31] GenomicAlignments_1.12.1      SummarizedExperiment_1.6.3   
[33] mime_0.5                      interactiveDisplayBase_1.14.0
[35] xtable_1.8-2                  httpuv_1.3.3                 
[37] lazyeval_0.2.0                RCurl_1.95-4.8

 

So, EnsDb databases with a DBSCHEMAVERSION >= 2.1 contain now also gene descriptions.

hope this helps.

ADD COMMENT
2
Entering edit mode
Johannes Rainer ★ 2.0k
@johannes-rainer-6987
Last seen 24 days ago
Italy

Dear Julien,

you're right - gene descriptions are not (yet) in the database. I opened an issue https://github.com/jotsetung/ensembldb/issues/49 , so there is chance I will add that in not too far future.

cheers, jo

ADD COMMENT
0
Entering edit mode
Julien Roux ▴ 90
@julien-roux-2710
Last seen 4.9 years ago
Switzerland/Basel/University of Basel

Great, thanks for the quick answer!

ADD COMMENT

Login before adding your answer.

Traffic: 502 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6