Hello,
I'm using org.Hs.eg.db (3.12.0) to map my list of genes (ENSEMBL) to get their SYMBOL. However, when I execute the following command, I get this message : 'select()' returned 1:many mapping between keys and columns meaning that some ENSEMBL id are duplicated, such as ENSG00000004866.
ENSG00000004866 ST7
ENSG00000004866 ST7-OT3
On the ENSEMBL website, it's mentionned that ST7-OT3 (NCBI gene (formerly Entrezgene) record; description: ST7 overlapping transcript 3,) is an external reference matched to Gene ENSG00000004866.
Why it's appearing in my list ? How can I manage this issue to get a 1:1 relation ?
Thanks in advance !
ann <- select(org.Hs.eg.db,keys=rownames(dge$counts),keytype="ENSEMBL",columns=c("SYMBOL"))
# 'select()' returned 1:many mapping between keys and columns
sessionInfo( )
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] taRifx_1.0.6.2 arsenal_3.6.2 reshape2_1.4.4
[4] Glimma_2.0.0 knitr_1.33 biomaRt_2.46.3
[7] Homo.sapiens_1.3.1 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 org.Hs.eg.db_3.12.0
[10] GO.db_3.12.1 OrganismDbi_1.32.0 GenomicFeatures_1.42.3
[13] AnnotationDbi_1.52.0 mixOmics_6.12.2 lattice_0.20-41
[16] MASS_7.3-53 RColorBrewer_1.1-2 magrittr_2.0.1
[19] openxlsx_4.2.3 forcats_0.5.1 stringr_1.4.0
[22] dplyr_1.0.5 purrr_0.3.4 readr_1.4.0
[25] tidyr_1.1.3 tibble_3.1.1 ggplot2_3.3.3
[28] tidyverse_1.3.1 DESeq2_1.30.1 SummarizedExperiment_1.18.2
[31] DelayedArray_0.16.3 MatrixGenerics_1.2.1 matrixStats_0.58.0
[34] Matrix_1.2-18 Biobase_2.50.0 GenomicRanges_1.40.0
[37] GenomeInfoDb_1.26.7 IRanges_2.24.1 S4Vectors_0.28.1
[40] BiocGenerics_0.36.1 edgeR_3.32.1 limma_3.46.0
Probably you'll find the help page of
select
very useful. There's the argumentmultiVals
to decide what to do in those cases.