Question: ensembldb and pseudogenes mapping to the same Ensembl ID
1
gravatar for meeta.mistry
22 months ago by
meeta.mistry20
United States
meeta.mistry20 wrote:

Hello,

I encountered a problem when mapping Ensembl genes to Entrez IDs and was wondering if there was a way around this. For a list of Ensembl gene IDs I used the select function to return to me gene symbols and Entrez IDs. 

common_genes <- select(EnsDb.Mmusculus.v79, keys=common, 
        columns=c("ENTREZID", "SYMBOL", "GENE_ID"), 
        keytype="GENEID")

Browsing through the table I noticed duplicate matches returned (i.e. for a singe Ensembl ID there are two Entrez IDs). I searched these IDs in the Entrez database and found that they are pseudogenes and in fact have different gene symbols but are not reported that way in output.

For example:

               GENEID  ENTREZID SYMBOL
72 ENSMUSG00000000740    270106  Rpl13
73 ENSMUSG00000000740 100040416  Rpl13

The second EntrezID is for Rpl13-ps6 which maps to ENSMUSG00000059776; and so this table is reporting incorrectly.

Is there anyway of identifying these pseudogenes using information stored in the database. Perhaps if there are Entrez gene symbols stored we could use those to filter out pseudogenes?

Any help on this would be much appreciated. Thanks in advance.

Meeta

 

 

 

ensembldb • 322 views
ADD COMMENTlink modified 22 months ago • written 22 months ago by meeta.mistry20
Answer: ensembldb and pseudogenes mapping to the same Ensembl ID
1
gravatar for Johannes Rainer
22 months ago by
Johannes Rainer1.5k
Italy
Johannes Rainer1.5k wrote:

Dear Meeta,

mapping between Entrez and Ensembl IDs is always problematic. EnsDb databases provide you with all the information from Ensembl (for a specific release) and in version 79 (March 2015) this one gene was annotated to two Entrez identifiers. Unfortunately, in EnsDb databases, there is no additional information about Entrez genes available (such as whether an Entrez gene is a pseudogene). For the mapping you could also use the org.Mm.eg.db package instead (that uses annotations from NCBI):

> library(org.Mm.eg.db)
> select(org.Mm.eg.db, columns = c("ENTREZID", "SYMBOL", "ENSEMBL"), keys = "Rpl13", keytype = "SYMBOL")
'select()' returned 1:1 mapping between keys and columns
  SYMBOL ENTREZID            ENSEMBL
1  Rpl13   270106 ENSMUSG00000000740

 

Or, alternatively, use an EnsDb database for a more recent Ensembl release (seems it was fixed in the more recent release):

> library(AnnotationHub)
> edb <- query(AnnotationHub(), "EnsDb.Mmusculus.v90")[[1]]
snapshotDate(): 2017-10-27
loading from cache '/Users/jo//.AnnotationHub/64508'
> select(edb, columns = c("ENTREZID", "SYMBOL", "GENEID"), keys = "Rpl13", keytype = "SYMBOL")
  ENTREZID SYMBOL             GENEID
1   270106  Rpl13 ENSMUSG00000000740

 

At last my session info:

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin17.3.0/x86_64 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] ensembldb_2.2.0        AnnotationFilter_1.2.0 GenomicFeatures_1.30.0
 [4] GenomicRanges_1.30.1   GenomeInfoDb_1.14.0    AnnotationHub_2.10.1  
 [7] org.Mm.eg.db_3.5.0     AnnotationDbi_1.40.0   IRanges_2.12.0        
[10] S4Vectors_0.16.0       Biobase_2.38.0         BiocGenerics_0.24.0   
[13] BiocInstaller_1.28.0  

loaded via a namespace (and not attached):
 [1] SummarizedExperiment_1.8.1    progress_1.1.2               
 [3] lattice_0.20-35               htmltools_0.3.6              
 [5] rtracklayer_1.38.2            yaml_2.1.16                  
 [7] interactiveDisplayBase_1.16.0 blob_1.1.0                   
 [9] XML_3.98-1.9                  rlang_0.1.6                  
[11] pillar_1.0.1                  DBI_0.7                      
[13] BiocParallel_1.12.0           bit64_0.9-7                  
[15] matrixStats_0.52.2            GenomeInfoDbData_1.0.0       
[17] ProtGenerics_1.10.0           stringr_1.2.0                
[19] zlibbioc_1.24.0               Biostrings_2.46.0            
[21] memoise_1.1.0                 biomaRt_2.34.1               
[23] httpuv_1.3.5                  curl_3.1                     
[25] Rcpp_0.12.14                  xtable_1.8-2                 
[27] DelayedArray_0.4.1            XVector_0.18.0               
[29] mime_0.5                      bit_1.1-12                   
[31] Rsamtools_1.30.0              RMySQL_0.10.13               
[33] digest_0.6.13                 stringi_1.1.6                
[35] shiny_1.0.5                   grid_3.4.3                   
[37] tools_3.4.3                   bitops_1.0-6                 
[39] magrittr_1.5                  lazyeval_0.2.1               
[41] RCurl_1.95-4.10               tibble_1.4.1                 
[43] RSQLite_2.0                   pkgconfig_2.0.1              
[45] Matrix_1.2-12                 prettyunits_1.0.2            
[47] assertthat_0.2.0              httr_1.3.1                   
[49] R6_2.2.2                      GenomicAlignments_1.14.1     
[51] compiler_3.4.3               

 

ADD COMMENTlink written 22 months ago by Johannes Rainer1.5k
Answer: ensembldb and pseudogenes mapping to the same Ensembl ID
0
gravatar for meeta.mistry
22 months ago by
meeta.mistry20
United States
meeta.mistry20 wrote:

Hi Johannes,

Thank you for your quick reply! Both of those alternatives are good to know and very helpful since I use this package often for cross-database annotations.

Best,

Meeta

ADD COMMENTlink written 22 months ago by meeta.mistry20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 383 users visited in the last hour