Error when using org.Hs.eg.db package
1
0
Entering edit mode
Masha • 0
@masha-25091
Last seen 3.0 years ago

Hello!

I am trying to use your tool to annotate some proteins and I've stumbled upon an error

> library(org.Hs.eg.db)

# This works fine
> mapIds(org.Hs.eg.db, keys = "Q8NGN2", keytype="UNIPROT", column="ENTREZID")
'select()' returned 1:1 mapping between keys and columns
  Q8NGN2 
"219873"

# And this does not work
> mapIds(org.Hs.eg.db, keys = "P0DPD7", keytype="UNIPROT", column="ENTREZID")
Error in .testForValidKeys(x, keys, keytype, fks) :
  None of the keys entered are valid keys for 'UNIPROT'. Please use the keys method to see a listing of valid arguments.

It seems that there are genes for both of these proteins on Uniprot (https://pir3.uniprot.org/uniprot/Q8NGN2, https://pir3.uniprot.org/uniprot/P0DPD7)

Is there a reason why I cannot annotate some of the proteins?

> sessionInfo( )

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=ru_RU.UTF-8       LC_NUMERIC=C               LC_TIME=ru_RU.UTF-8        LC_COLLATE=ru_RU.UTF-8     LC_MONETARY=ru_RU.UTF-8   
 [6] LC_MESSAGES=ru_RU.UTF-8    LC_PAPER=ru_RU.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=ru_RU.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] org.Hs.eg.db_3.12.0  AnnotationDbi_1.52.0 IRanges_2.24.1       S4Vectors_0.28.1     Biobase_2.50.0       BiocGenerics_0.36.0 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6          DBI_1.1.1           RSQLite_2.2.4       cachem_1.0.4        rlang_0.4.10        blob_1.2.1          vctrs_0.3.6        
 [8] tools_4.0.3         bit64_4.0.5         bit_4.0.4           fastmap_1.1.0       compiler_4.0.3      pkgconfig_2.0.3     BiocManager_1.30.10
[15] memoise_2.0.0

Thank you!

org.Hs.eg.db • 1.2k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 14 hours ago
United States

It's probably due to a mismatch between what UniProt thinks the mapping is, and what NCBI thinks. If you go to UniProt and search on 9718, you will get ECE2. But if you go to NCBI and search on P0DPD7, you will get EEFIA. And if you peruse the NCBI Gene page, it says

Note: This gene was annotated as ECE2 (GeneID:9718) until recently in Annotation Release 108. It is now annotated as an independent gene adjacent to GeneID:1978 based on published reports (PMID 28520920). [05 Jul 2017]

ADD COMMENT
1
Entering edit mode

Huh. Somehow in querying around I got from P0DPD7 to P0DPD6. But anyway, if you want to map from UniProt to NCBI Gene IDs, it's often easier to use UniProt instead of NCBI (which is what the org.Hs.eg.db package is based on). One alternative is to use the UniProt.ws package, but it's not working for me right now for some reason.

For direct queries like this it's just as easy to query the UniProt REST server directly. As an example

## some random UniProt IDs, plus your two

> z <- c("Q8NGN2","P0DPD7", keys(org.Hs.eg.db, "UNIPROT")[1:40])
> z
 [1] "Q8NGN2"     "P0DPD7"     "P04217"     "V9HWD8"     "P01023"    
 [6] "P18440"     "Q400J6"     "F5H5R8"     "A4Z6T7"     "P11245"    
[11] "A0A024R6P0" "P01011"     "P22760"     "A0A024R410" "Q13685"    
[16] "C9JEH3"     "F1T0I5"     "Q16613"     "P49588"     "P80404"    
[21] "X5D8S1"     "B2RUU2"     "B7XCW9"     "O95477"     "Q9BZC7"    
[26] "Q4LE27"     "Q99758"     "O75027"     "A0A087WW65" "A0A0S2Z2Z3"
[31] "A0A1U9X609" "Q8NE71"     "Q2L6I2"     "P78363"     "Q6AI28"    
[36] "A0A024R8E2" "P00519"     "Q59FK4"     "P19801"     "P42684"    
[41] "A0A089QDC1" "P16442"    

## now query

> zz <- read.table(paste0("https://www.uniprot.org/mapping/?from=ACC%2BID&to=P_ENTREZGENEID&format=tab&query=", paste(z, collapse = "%20")), header = TRUE)
> zz
         From        To
1      Q8NGN2    219873
2      P0DPD7 110599564
3      P04217         1
4      V9HWD8         1
5      P01023         2
6      P18440         9
7      Q400J6         9
8      F5H5R8         9
9      A4Z6T7        10
10     P11245        10
11 A0A024R6P0        12
12     P01011        12
13     P22760        13
14     Q13685        14
15     C9JEH3        14
16     F1T0I5        15
17     Q16613        15
18     P49588        16
19     P80404        18
20     X5D8S1        18
21     B2RUU2        19
22     B7XCW9        19
23     O95477        19
24     Q9BZC7        20
25     Q4LE27        21
26     Q99758        21
27     O75027        22
28 A0A087WW65        22
29 A0A0S2Z2Z3        22
30 A0A1U9X609        23
31     Q8NE71        23
32     Q2L6I2        23
33     P78363        24
34     Q6AI28        24
35 A0A024R8E2        25
36     P00519        25
37     Q59FK4        25
38     P19801        26
39     P42684        27
40 A0A089QDC1        28
41     P16442        28

This will create the required format, which looks like

https://www.uniprot.org/mapping/?from=ACC%2BID&to=P_ENTREZGENEID&format=tab&query=Q8NGN2%20P0DPD7

And which you can just paste into the address bar of any browser to bring up the table of results. We use read.table to then just read those data into a data.frame.

ADD REPLY

Login before adding your answer.

Traffic: 609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6