bitr returning unexpected length of gene IDs after conversion.
Last seen 8 weeks ago

Hi everyone,

I finding that bitr is returning different gene ID lengths, depending on which keytype I am using. Could anyone shed some light?

For example...

sample_list <- c("A2M","ABL1","ADCYS","AGPAT2")

sample_gene <- bitr(sample_list, fromType="SYMBOL", toType="ENSEMBL", OrgDb="")
sample_gene2 <- bitr(sample_list, fromType="SYMBOL", toType="ENTREZID", OrgDb="")
# Sample_list returns a vector of length 4, but the resulting conversions are both of length 3. Missing the ADCYS gene.


R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

[1] LC_COLLATE=English_Australia.utf8  LC_CTYPE=English_Australia.utf8   
[3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1]   AnnotationDbi_1.58.0  IRanges_2.30.1        S4Vectors_0.34.0     
[5] Biobase_2.56.0        BiocGenerics_0.42.0   clusterProfiler_4.4.4

loaded via a namespace (and not attached):
 [1] nlme_3.1-157           bitops_1.0-7           ggtree_3.4.4           enrichplot_1.16.2     
 [5] bit64_4.0.5            RColorBrewer_1.1-3     httr_1.4.4             GenomeInfoDb_1.32.4   
 [9] tools_4.2.1            utf8_1.2.2             R6_2.5.1               lazyeval_0.2.2        
[13] DBI_1.1.3              colorspace_2.0-3       withr_2.5.0            tidyselect_1.1.2      
[17] gridExtra_2.3          bit_4.0.4              compiler_4.2.1         cli_3.4.1             
[21] scatterpie_0.1.8       shadowtext_0.1.2       scales_1.2.1           yulab.utils_0.0.5     
[25] stringr_1.4.1          digest_0.6.29          DOSE_3.22.1            XVector_0.36.0        
[29] pkgconfig_2.0.3        fastmap_1.1.0          rlang_1.0.6            rstudioapi_0.14       
[33] RSQLite_2.2.17         gridGraphics_0.5-1     generics_0.1.3         farver_2.1.1          
[37] jsonlite_1.8.0         BiocParallel_1.30.3    GOSemSim_2.22.0        dplyr_1.0.10          
[41] RCurl_1.98-1.8         magrittr_2.0.3         ggplotify_0.1.0        GO.db_3.15.0          
[45] GenomeInfoDbData_1.2.8 patchwork_1.1.2        Matrix_1.4-1           Rcpp_1.0.9            
[49] munsell_0.5.0          fansi_1.0.3            ape_5.6-2              viridis_0.6.2         
[53] lifecycle_1.0.2        stringi_1.7.8          ggraph_2.0.6           MASS_7.3-57           
[57] zlibbioc_1.42.0        plyr_1.8.7             qvalue_2.28.0          grid_4.2.1            
[61] blob_1.2.3             parallel_4.2.1         ggrepel_0.9.1          DO.db_2.9             
[65] crayon_1.5.2           lattice_0.20-45        graphlayouts_0.8.1     Biostrings_2.64.1     
[69] splines_4.2.1          KEGGREST_1.36.3        pillar_1.8.1           fgsea_1.22.0          
[73] igraph_1.3.5           reshape2_1.4.4         codetools_0.2-18       fastmatch_1.1-3       
[77] glue_1.6.2             ggfun_0.0.7            downloader_0.4         data.table_1.14.2     
[81] BiocManager_1.30.18    treeio_1.20.2          png_0.1-7              vctrs_0.4.1           
[85] tweenr_2.0.2           gtable_0.3.1           purrr_0.3.4            polyclip_1.10-0       
[89] tidyr_1.2.1            assertthat_0.2.1       cachem_1.0.6           ggplot2_3.3.6         
[93] ggforce_0.3.4          tidygraph_1.2.2        tidytree_0.4.1         viridisLite_0.4.1     
[97] tibble_3.1.8           aplot_0.1.7            memoise_2.0.1

In the example above, both conversions return vectors with lengths less than the original. However in the sample file I am playing, conversion into ENSEMBL from SYMBOL actually yielded more. (8962 from ENSEMBL from 8090 SYMBOLs) Thank you

clusterProfiler
I cannot find the existence of a "ADCYS" gene in human so this is normal that it is discarded

Last seen 54 minutes ago
United States

You don't provide all the output.

> bitr(sample_list, "SYMBOL","ENTREZID","")
'select()' returned 1:1 mapping between keys and columns
1    A2M        2
2   ABL1       25
4 AGPAT2    10555
Warning message:
In bitr(sample_list, "SYMBOL", "ENTREZID", "") :
  25% of input gene IDs are fail to map...

Which seems pretty explanatory? But bitr is just a wrapper around select, which would have shown you that some things don't map rather than telling you that a percentage didn't map.

> select(, sample_list, "ENTREZID", "SYMBOL")
'select()' returned 1:1 mapping between keys and columns
1    A2M        2
2   ABL1       25
3  ADCYS     <NA>
4 AGPAT2    10555

