org.Ss.eg.db MAMDC4 maps to incorrect UNIPROT key
0
0
Entering edit mode
@a35b1e12
Last seen 9 weeks ago
United States

AnnotationDbi::select() on org.Ss.eg.db returns two unique matches for UNIPROT P59083, (PHP14_PIG, MAMDC4_PIG). I expect one. Crosschecking at uniprot.org: P59083 comes up as PHPT1 (PHP14_PIG), and lists (among others) F1RW01 for MAMDC4. Checking at https://www.ncbi.nlm.nih.gov/search/ the ENTREZID's shown below map identically. So the UNIPROT mapping appears wrong, but does not appear to come from Uniprot.org.

A BLAST of P59083 Fasta hits on many orthologs of PHPT1 but there no hits to any other proteins within scrofa, so MAMDC4 is not PHPT1. Evidence suggests somewhere a database is wrong in mapping MAMDC4 to P59083, but I don't know where. Since it maps fine at Uniprot and Entrez it seems plausible the error in mapping is within the org.Ss.eg.db object itself.

So, I'm posting here as a starting point, since I verfied Uniprot and Entrez do not show the double hit.

Code to reproduce is below.

Code should be placed in three backticks as shown below


# Running inside RStudio:
BiocManager::install("AnnotationDbi")
library(AnnotationDbi)
AnnotationDbi::select(org.Ss.eg.db, keys="P59083", keytype="UNIPROT", columns=c("UNIPROT", "SYMBOL", "ENTREZID", "GENENAME"))
'select()' returned 1:many mapping between keys and columns
  UNIPROT SYMBOL  ENTREZID                       GENENAME
1  P59083 MAMDC4 100513261        MAM domain containing 4
2  P59083  PHPT1 126964416 phosphohistidine phosphatase 1

#sessionInfo( )
R version 4.5.1 (2025-06-13 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default
  LAPACK version 3.12.1

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] clusterProfiler_4.16.0 edgeR_4.6.3            limma_3.64.3           ggrepel_0.9.6          ggridges_0.5.6        
 [6] stringi_1.8.7          biomaRt_2.64.0         UniProt.ws_2.48.0      org.Mm.eg.db_3.21.0    org.Ss.eg.db_3.21.0   
[11] org.Hs.eg.db_3.21.0    AnnotationDbi_1.70.0   IRanges_2.42.0         S4Vectors_0.46.0       Biobase_2.68.0        
[16] BiocGenerics_0.54.0    generics_0.1.4         BiocManager_1.30.26    openxlsx_4.2.8         readxl_1.4.5          
[21] lubridate_1.9.4        forcats_1.0.0          stringr_1.5.1          dplyr_1.1.4            purrr_1.1.0           
[26] readr_2.1.5            tidyr_1.3.1            tibble_3.3.0           ggplot2_3.5.2          tidyverse_2.0.0       

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3      rstudioapi_0.17.1       jsonlite_2.0.0          magrittr_2.0.3          ggtangle_0.0.7         
  [6] farver_2.1.2            fs_1.6.6                vctrs_0.6.5             memoise_2.0.1           ggtree_3.16.3          
 [11] BiocBaseUtils_1.10.0    progress_1.2.3          curl_7.0.0              cellranger_1.1.0        gridGraphics_0.5-1     
 [16] pROC_1.19.0.1           caret_7.0-1             parallelly_1.45.1       plyr_1.8.9              httr2_1.2.1            
 [21] cachem_1.1.0            igraph_2.1.4            lifecycle_1.0.4         iterators_1.0.14        pkgconfig_2.0.3        
 [26] gson_0.1.0              Matrix_1.7-3            R6_2.6.1                fastmap_1.2.0           GenomeInfoDbData_1.2.14
 [31] future_1.67.0           aplot_0.2.8             enrichplot_1.28.4       digest_0.6.37           patchwork_1.3.1        
 [36] RSQLite_2.4.3           filelock_1.0.3          timechange_0.3.0        httr_1.4.7              compiler_4.5.1         
 [41] bit64_4.6.0-1           withr_3.0.2             BiocParallel_1.42.1     DBI_1.2.3               rjsoncons_1.3.2        
 [46] R.utils_2.13.0          MASS_7.3-65             lava_1.8.1              rappdirs_0.3.3          ModelMetrics_1.2.2.2   
 [51] tools_4.5.1             ape_5.8-1               zip_2.3.3               future.apply_1.20.0     nnet_7.3-20            
 [56] R.oo_1.27.1             glue_1.8.0              nlme_3.1-168            GOSemSim_2.34.0         grid_4.5.1             
 [61] reshape2_1.4.4          fgsea_1.34.2            recipes_1.3.1           gtable_0.3.6            tzdb_0.5.0             
 [66] R.methodsS3_1.8.2       class_7.3-23            data.table_1.17.8       hms_1.1.3               xml2_1.4.0             
 [71] XVector_0.48.0          foreach_1.5.2           pillar_1.11.0           yulab.utils_0.2.1       splines_4.5.1          
 [76] treeio_1.32.0           BiocFileCache_2.16.1    lattice_0.22-7          survival_3.8-3          bit_4.6.0              
 [81] tidyselect_1.2.1        GO.db_3.21.0            locfit_1.5-9.12         Biostrings_2.76.0       statmod_1.5.0          
 [86] hardhat_1.4.2           timeDate_4041.110       UCSC.utils_1.4.0        lazyeval_0.2.2          ggfun_0.2.0            
 [91] codetools_0.2-20        qvalue_2.40.0           AnVILBase_1.2.0         ggplotify_0.1.2         cli_3.6.5              
 [96] rpart_4.1.24            Rcpp_1.1.0              GenomeInfoDb_1.44.2     globals_0.18.0          dbplyr_2.5.0           
[101] png_0.1-8               parallel_4.5.1          gower_1.0.2             blob_1.2.4              prettyunits_1.2.0      
[106] DOSE_4.2.0              listenv_0.9.1           tidytree_0.4.6          ipred_0.9-15            scales_1.4.0           
[111] prodlim_2025.04.28      crayon_1.5.3            rlang_1.1.6             fastmatch_1.1-6         cowplot_1.2.0          
[116] KEGGREST_1.48.1
MAMDC4 org.Ss.eg.db AnnotationDbi • 191 views
ADD COMMENT

Login before adding your answer.

Traffic: 1056 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6