Search
Question: Is it possible to get Ensembl gene descriptions with ensembldb?
0
gravatar for Julien Roux
5 months ago by
Julien Roux90
Switzerland
Julien Roux90 wrote:

Dear ensembldb maintainers,

First, thanks a lot for this great resource!

I have a question regarding the "gene description" field. It seems to me that it is not accessible from ensembldb, at least I couldn't find it in the available "columns"... For example, for the gene "Cntnap1" I would like to retrieve the following field: "contactin associated protein-like 1 [Source:MGI Symbol;Acc:MGI:1858201]" (see http://www.ensembl.org/Mus_musculus/Gene/Summary?db=core;g=ENSMUSG00000017167)

Below is what I tried, am I missing something?

library(AnnotationHub)
ah <- AnnotationHub()
ahDb <- query(ah, "Ensembl 88 EnsDb for Mus musculus")
edb <- ahDb[[1]]
columns(edb)

[1] "ENTREZID"            "EXONID"              "EXONIDX"             "EXONSEQEND"          "EXONSEQSTART"        "GENEBIOTYPE"       
[7] "GENEID"              "GENENAME"            "GENESEQEND"          "GENESEQSTART"        "INTERPROACCESSION"   "ISCIRCULAR"        
[13] "PROTDOMEND"          "PROTDOMSTART"        "PROTEINDOMAINID"     "PROTEINDOMAINSOURCE" "PROTEINID"           "PROTEINSEQUENCE"   
[19] "SEQCOORDSYSTEM"      "SEQLENGTH"           "SEQNAME"             "SEQSTRAND"           "SYMBOL"              "TXBIOTYPE"         
[25] "TXCDSSEQEND"         "TXCDSSEQSTART"       "TXID"                "TXNAME"              "TXSEQEND"            "TXSEQSTART"        
[31] "UNIPROTDB"           "UNIPROTID"           "UNIPROTMAPPINGTYPE"

Here is my sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base     

other attached packages:
[1] AnnotationHub_2.8.1 BiocGenerics_0.22.0

loaded via a namespace (and not attached):
[1] Rcpp_0.12.11                  IRanges_2.10.2              
[3] digest_0.6.12                 mime_0.5                    
[5] R6_2.2.1                      xtable_1.8-2                
[7] DBI_0.6-1                     stats4_3.4.0                
[9] RSQLite_1.1-2                 BiocInstaller_1.26.0        
[11] httr_1.2.1                    S4Vectors_0.14.2            
[13] Biobase_2.36.2                shiny_1.0.3                 
[15] httpuv_1.3.3                  yaml_2.1.14                 
[17] compiler_3.4.0                AnnotationDbi_1.38.0        
[19] memoise_1.1.0                 htmltools_0.3.6             
[21] interactiveDisplayBase_1.14.0

 

ADD COMMENTlink modified 5 months ago by Johannes Rainer1.0k • written 5 months ago by Julien Roux90
2
gravatar for Johannes Rainer
5 months ago by
Johannes Rainer1.0k
Italy
Johannes Rainer1.0k wrote:

Dear Julien,

you're right - gene descriptions are not (yet) in the database. I opened an issue https://github.com/jotsetung/ensembldb/issues/49 , so there is chance I will add that in not too far future.

cheers, jo

ADD COMMENTlink written 5 months ago by Johannes Rainer1.0k
1
gravatar for Johannes Rainer
5 months ago by
Johannes Rainer1.0k
Italy
Johannes Rainer1.0k wrote:

OK, it's now available for Ensembl 87 (still working on the Ensembl 88) that you can get from AnnotationHub:

> library(AnnotationHub)
> ah <- AnnotationHub()
> edb <- query(ah, "EnsDb.Hsapiens.v87")
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.3.1
|Creation time: Thu Jun  8 07:27:44 2017
|ensembl_version: 87
|ensembl_host: localhost
|Organism: homo_sapiens
|taxonomy_id: 9606
|genome_build: GRCh38
|DBSCHEMAVERSION: 2.1
| No. of genes: 63970.
| No. of transcripts: 216741.
|Protein data available.

> genes(edb, columns = c("gene_id", "description"), return.type = "DataFrame")
DataFrame with 63970 rows and 2 columns
              gene_id
          <character>
1     ENSG00000000003
2     ENSG00000000005
3     ENSG00000000419
...               ...
63968          LRG_98
63969          LRG_99
63970         LRG_992
                                                                                         description
                                                                                         <character>
1                                                  tetraspanin 6 [Source:HGNC Symbol;Acc:HGNC:11858]
2                                                    tenomodulin [Source:HGNC Symbol;Acc:HGNC:17757]
3     dolichyl-phosphate mannosyltransferase subunit 1, catalytic [Source:HGNC Symbol;Acc:HGNC:3005]
...                                                                                              ...
63968                                  recombination activating 1 [Source:HGNC Symbol;Acc:HGNC:9831]
63969                                  recombination activating 2 [Source:HGNC Symbol;Acc:HGNC:9832]
63970                                         estrogen receptor 1 [Source:HGNC Symbol;Acc:HGNC:3467]

> sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin16.6.0/x86_64 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] ensembldb_2.0.3        AnnotationFilter_1.0.0 GenomicFeatures_1.28.2
 [4] AnnotationDbi_1.38.1   Biobase_2.36.2         GenomicRanges_1.28.3  
 [7] GenomeInfoDb_1.12.1    IRanges_2.10.2         S4Vectors_0.14.3      
[10] AnnotationHub_2.8.1    BiocGenerics_0.22.0   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11                  BiocInstaller_1.26.0         
 [3] compiler_3.4.0                XVector_0.16.0               
 [5] ProtGenerics_1.8.0            bitops_1.0-6                 
 [7] tools_3.4.0                   zlibbioc_1.22.0              
 [9] biomaRt_2.32.0                digest_0.6.12                
[11] lattice_0.20-35               RSQLite_1.1-2                
[13] memoise_1.1.0                 Matrix_1.2-10                
[15] DelayedArray_0.2.7            shiny_1.0.3                  
[17] DBI_0.6-1                     curl_2.6                     
[19] yaml_2.1.14                   GenomeInfoDbData_0.99.0      
[21] rtracklayer_1.36.3            httr_1.2.1                   
[23] Biostrings_2.44.1             grid_3.4.0                   
[25] R6_2.2.1                      XML_3.98-1.7                 
[27] BiocParallel_1.10.1           matrixStats_0.52.2           
[29] Rsamtools_1.28.0              htmltools_0.3.6              
[31] GenomicAlignments_1.12.1      SummarizedExperiment_1.6.3   
[33] mime_0.5                      interactiveDisplayBase_1.14.0
[35] xtable_1.8-2                  httpuv_1.3.3                 
[37] lazyeval_0.2.0                RCurl_1.95-4.8

 

So, EnsDb databases with a DBSCHEMAVERSION >= 2.1 contain now also gene descriptions.

hope this helps.

ADD COMMENTlink written 5 months ago by Johannes Rainer1.0k
0
gravatar for Julien Roux
5 months ago by
Julien Roux90
Switzerland
Julien Roux90 wrote:

Great, thanks for the quick answer!

ADD COMMENTlink written 5 months ago by Julien Roux90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 239 users visited in the last hour