Issue "one-to-many" with org.Hs.eg.db
1
0
Entering edit mode
Nathalie • 0
@6a5d5618
Last seen 3.5 years ago
France

Hello,

I'm using org.Hs.eg.db (3.12.0) to map my list of genes (ENSEMBL) to get their SYMBOL. However, when I execute the following command, I get this message : 'select()' returned 1:many mapping between keys and columns meaning that some ENSEMBL id are duplicated, such as ENSG00000004866.

ENSG00000004866 ST7

ENSG00000004866 ST7-OT3

On the ENSEMBL website, it's mentionned that ST7-OT3 (NCBI gene (formerly Entrezgene) record; description: ST7 overlapping transcript 3,) is an external reference matched to Gene ENSG00000004866.

Why it's appearing in my list ? How can I manage this issue to get a 1:1 relation ?

Thanks in advance !

ann <- select(org.Hs.eg.db,keys=rownames(dge$counts),keytype="ENSEMBL",columns=c("SYMBOL")) 
#  'select()' returned 1:many mapping between keys and columns



sessionInfo( )
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] taRifx_1.0.6.2                          arsenal_3.6.2                           reshape2_1.4.4                         
 [4] Glimma_2.0.0                            knitr_1.33                              biomaRt_2.46.3                         
 [7] Homo.sapiens_1.3.1                      TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 org.Hs.eg.db_3.12.0                    
[10] GO.db_3.12.1                            OrganismDbi_1.32.0                      GenomicFeatures_1.42.3                 
[13] AnnotationDbi_1.52.0                    mixOmics_6.12.2                         lattice_0.20-41                        
[16] MASS_7.3-53                             RColorBrewer_1.1-2                      magrittr_2.0.1                         
[19] openxlsx_4.2.3                          forcats_0.5.1                           stringr_1.4.0                          
[22] dplyr_1.0.5                             purrr_0.3.4                             readr_1.4.0                            
[25] tidyr_1.1.3                             tibble_3.1.1                            ggplot2_3.3.3                          
[28] tidyverse_1.3.1                         DESeq2_1.30.1                           SummarizedExperiment_1.18.2            
[31] DelayedArray_0.16.3                     MatrixGenerics_1.2.1                    matrixStats_0.58.0                     
[34] Matrix_1.2-18                           Biobase_2.50.0                          GenomicRanges_1.40.0                   
[37] GenomeInfoDb_1.26.7                     IRanges_2.24.1                          S4Vectors_0.28.1                       
[40] BiocGenerics_0.36.1                     edgeR_3.32.1                            limma_3.46.0
org.Hs.eg.db • 1.7k views
ADD COMMENT
0
Entering edit mode

Probably you'll find the help page of select very useful. There's the argument multiVals to decide what to do in those cases.

ADD REPLY
0
Entering edit mode
Guido Hooiveld ★ 4.1k
@guido-hooiveld-2020
Last seen 3 hours ago
Wageningen University, Wageningen, the …

.... and since your primary IDs are ENSEMBL ids; why don't you use an EnsDB (instead of a NCBI-based OrgDb) for annotating? This thread may be useful to get started with an EnsDb.

ADD COMMENT
0
Entering edit mode

Many thanks for pointing to EnsDB. I never knew about having a specific Ensembl data package!

ADD REPLY

Login before adding your answer.

Traffic: 968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6