bitr returning unexpected length of gene IDs after conversion.
1
0
Entering edit mode
@b1d3b7dd
Last seen 17 months ago
Australia

Hi everyone,

I finding that bitr is returning different gene ID lengths, depending on which keytype I am using. Could anyone shed some light?

For example...

sample_list <- c("A2M","ABL1","ADCYS","AGPAT2")

sample_gene <- bitr(sample_list, fromType="SYMBOL", toType="ENSEMBL", OrgDb="org.Hs.eg.db")
sample_gene2 <- bitr(sample_list, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
# Sample_list returns a vector of length 4, but the resulting conversions are both of length 3. Missing the ADCYS gene.

sessionInfo()

R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.utf8  LC_CTYPE=English_Australia.utf8   
[3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] org.Hs.eg.db_3.15.0   AnnotationDbi_1.58.0  IRanges_2.30.1        S4Vectors_0.34.0     
[5] Biobase_2.56.0        BiocGenerics_0.42.0   clusterProfiler_4.4.4

loaded via a namespace (and not attached):
 [1] nlme_3.1-157           bitops_1.0-7           ggtree_3.4.4           enrichplot_1.16.2     
 [5] bit64_4.0.5            RColorBrewer_1.1-3     httr_1.4.4             GenomeInfoDb_1.32.4   
 [9] tools_4.2.1            utf8_1.2.2             R6_2.5.1               lazyeval_0.2.2        
[13] DBI_1.1.3              colorspace_2.0-3       withr_2.5.0            tidyselect_1.1.2      
[17] gridExtra_2.3          bit_4.0.4              compiler_4.2.1         cli_3.4.1             
[21] scatterpie_0.1.8       shadowtext_0.1.2       scales_1.2.1           yulab.utils_0.0.5     
[25] stringr_1.4.1          digest_0.6.29          DOSE_3.22.1            XVector_0.36.0        
[29] pkgconfig_2.0.3        fastmap_1.1.0          rlang_1.0.6            rstudioapi_0.14       
[33] RSQLite_2.2.17         gridGraphics_0.5-1     generics_0.1.3         farver_2.1.1          
[37] jsonlite_1.8.0         BiocParallel_1.30.3    GOSemSim_2.22.0        dplyr_1.0.10          
[41] RCurl_1.98-1.8         magrittr_2.0.3         ggplotify_0.1.0        GO.db_3.15.0          
[45] GenomeInfoDbData_1.2.8 patchwork_1.1.2        Matrix_1.4-1           Rcpp_1.0.9            
[49] munsell_0.5.0          fansi_1.0.3            ape_5.6-2              viridis_0.6.2         
[53] lifecycle_1.0.2        stringi_1.7.8          ggraph_2.0.6           MASS_7.3-57           
[57] zlibbioc_1.42.0        plyr_1.8.7             qvalue_2.28.0          grid_4.2.1            
[61] blob_1.2.3             parallel_4.2.1         ggrepel_0.9.1          DO.db_2.9             
[65] crayon_1.5.2           lattice_0.20-45        graphlayouts_0.8.1     Biostrings_2.64.1     
[69] splines_4.2.1          KEGGREST_1.36.3        pillar_1.8.1           fgsea_1.22.0          
[73] igraph_1.3.5           reshape2_1.4.4         codetools_0.2-18       fastmatch_1.1-3       
[77] glue_1.6.2             ggfun_0.0.7            downloader_0.4         data.table_1.14.2     
[81] BiocManager_1.30.18    treeio_1.20.2          png_0.1-7              vctrs_0.4.1           
[85] tweenr_2.0.2           gtable_0.3.1           purrr_0.3.4            polyclip_1.10-0       
[89] tidyr_1.2.1            assertthat_0.2.1       cachem_1.0.6           ggplot2_3.3.6         
[93] ggforce_0.3.4          tidygraph_1.2.2        tidytree_0.4.1         viridisLite_0.4.1     
[97] tibble_3.1.8           aplot_0.1.7            memoise_2.0.1

In the example above, both conversions return vectors with lengths less than the original. However in the sample file I am playing, conversion into ENSEMBL from SYMBOL actually yielded more. (8962 from ENSEMBL from 8090 SYMBOLs) Thank you

clusterProfiler • 874 views
ADD COMMENT
1
Entering edit mode

I cannot find the existence of a "ADCYS" gene in human so this is normal that it is discarded

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 4 hours ago
United States

You don't provide all the output.

> bitr(sample_list, "SYMBOL","ENTREZID","org.Hs.eg.db")
'select()' returned 1:1 mapping between keys and columns
  SYMBOL ENTREZID
1    A2M        2
2   ABL1       25
4 AGPAT2    10555
Warning message:
In bitr(sample_list, "SYMBOL", "ENTREZID", "org.Hs.eg.db") :
  25% of input gene IDs are fail to map...

Which seems pretty explanatory? But bitr is just a wrapper around select, which would have shown you that some things don't map rather than telling you that a percentage didn't map.

> select(org.Hs.eg.db, sample_list, "ENTREZID", "SYMBOL")
'select()' returned 1:1 mapping between keys and columns
  SYMBOL ENTREZID
1    A2M        2
2   ABL1       25
3  ADCYS     <NA>
4 AGPAT2    10555
ADD COMMENT

Login before adding your answer.

Traffic: 695 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6