Entering edit mode
Hi everyone,
I finding that bitr is returning different gene ID lengths, depending on which keytype I am using. Could anyone shed some light?
For example...
sample_list <- c("A2M","ABL1","ADCYS","AGPAT2")
sample_gene <- bitr(sample_list, fromType="SYMBOL", toType="ENSEMBL", OrgDb="org.Hs.eg.db")
sample_gene2 <- bitr(sample_list, fromType="SYMBOL", toType="ENTREZID", OrgDb="org.Hs.eg.db")
# Sample_list returns a vector of length 4, but the resulting conversions are both of length 3. Missing the ADCYS gene.
sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] LC_COLLATE=English_Australia.utf8 LC_CTYPE=English_Australia.utf8
[3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C
[5] LC_TIME=English_Australia.utf8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] org.Hs.eg.db_3.15.0 AnnotationDbi_1.58.0 IRanges_2.30.1 S4Vectors_0.34.0
[5] Biobase_2.56.0 BiocGenerics_0.42.0 clusterProfiler_4.4.4
loaded via a namespace (and not attached):
[1] nlme_3.1-157 bitops_1.0-7 ggtree_3.4.4 enrichplot_1.16.2
[5] bit64_4.0.5 RColorBrewer_1.1-3 httr_1.4.4 GenomeInfoDb_1.32.4
[9] tools_4.2.1 utf8_1.2.2 R6_2.5.1 lazyeval_0.2.2
[13] DBI_1.1.3 colorspace_2.0-3 withr_2.5.0 tidyselect_1.1.2
[17] gridExtra_2.3 bit_4.0.4 compiler_4.2.1 cli_3.4.1
[21] scatterpie_0.1.8 shadowtext_0.1.2 scales_1.2.1 yulab.utils_0.0.5
[25] stringr_1.4.1 digest_0.6.29 DOSE_3.22.1 XVector_0.36.0
[29] pkgconfig_2.0.3 fastmap_1.1.0 rlang_1.0.6 rstudioapi_0.14
[33] RSQLite_2.2.17 gridGraphics_0.5-1 generics_0.1.3 farver_2.1.1
[37] jsonlite_1.8.0 BiocParallel_1.30.3 GOSemSim_2.22.0 dplyr_1.0.10
[41] RCurl_1.98-1.8 magrittr_2.0.3 ggplotify_0.1.0 GO.db_3.15.0
[45] GenomeInfoDbData_1.2.8 patchwork_1.1.2 Matrix_1.4-1 Rcpp_1.0.9
[49] munsell_0.5.0 fansi_1.0.3 ape_5.6-2 viridis_0.6.2
[53] lifecycle_1.0.2 stringi_1.7.8 ggraph_2.0.6 MASS_7.3-57
[57] zlibbioc_1.42.0 plyr_1.8.7 qvalue_2.28.0 grid_4.2.1
[61] blob_1.2.3 parallel_4.2.1 ggrepel_0.9.1 DO.db_2.9
[65] crayon_1.5.2 lattice_0.20-45 graphlayouts_0.8.1 Biostrings_2.64.1
[69] splines_4.2.1 KEGGREST_1.36.3 pillar_1.8.1 fgsea_1.22.0
[73] igraph_1.3.5 reshape2_1.4.4 codetools_0.2-18 fastmatch_1.1-3
[77] glue_1.6.2 ggfun_0.0.7 downloader_0.4 data.table_1.14.2
[81] BiocManager_1.30.18 treeio_1.20.2 png_0.1-7 vctrs_0.4.1
[85] tweenr_2.0.2 gtable_0.3.1 purrr_0.3.4 polyclip_1.10-0
[89] tidyr_1.2.1 assertthat_0.2.1 cachem_1.0.6 ggplot2_3.3.6
[93] ggforce_0.3.4 tidygraph_1.2.2 tidytree_0.4.1 viridisLite_0.4.1
[97] tibble_3.1.8 aplot_0.1.7 memoise_2.0.1
In the example above, both conversions return vectors with lengths less than the original. However in the sample file I am playing, conversion into ENSEMBL from SYMBOL actually yielded more. (8962 from ENSEMBL from 8090 SYMBOLs) Thank you
I cannot find the existence of a "ADCYS" gene in human so this is normal that it is discarded