Hello I am having a problem with matching my ENSEMBL ID's to the corresponding gene symbol. When I try to use the filter "ensembl_gene_id_version" I only get an output of 11 corresponding gene symbols. When I use no filter I get an output of around 50,000 gene symbols. Which is confusing considering I have only 15,000 ENSEMBL ID's. This creates a problem for me when I try to merge the counts data frame with the gene symbol output from getBM. I have tried different filters, I am using the most up to data version of the dataset. Code should be placed in three backticks as shown below
mart<- useMart(biomart ='ensembl',
dataset = 'mmusculus_gene_ensembl',
host='useast.ensembl.org')all_coding_genes<- getBM(attributes=c("mgi_symbol"),values= row.names(res_ordered), filters = "ensembl_gene_id_version", mart= mart, uniqueRows = TRUE)
include your problematic code here with any corresponding output
please also include the results of running the following in an R session
sessionInfo( )
```> sessionInfo() R version 4.2.0 (2022-04-22 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages: [1] tools stats4 stats graphics grDevices [6] utils datasets methods base
other attached packages:
[1] fuzzyjoin_0.1.6
[2] org.Mm.eg.db_3.15.0
[3] AnnotationDbi_1.58.0
[4] RColorBrewer_1.1-3
[5] pheatmap_1.0.12
[6] colorspace_2.0-3
[7] EnhancedVolcano_1.14.0
[8] ggrepel_0.9.1
[9] forcats_0.5.1
[10] stringr_1.4.0
[11] purrr_0.3.4
[12] readr_2.1.2
[13] tidyr_1.2.0
[14] ggplot2_3.3.5
[15] tidyverse_1.3.1
[16] DESeq2_1.36.0
[17] SummarizedExperiment_1.26.1
[18] Biobase_2.56.0
[19] MatrixGenerics_1.8.0
[20] matrixStats_0.62.0
[21] GenomicRanges_1.48.0
[22] GenomeInfoDb_1.32.1
[23] IRanges_2.30.0
[24] S4Vectors_0.34.0
[25] BiocGenerics_0.42.0
[26] tibble_3.1.6
[27] dplyr_1.0.8
[28] R.utils_2.11.0
[29] R.oo_1.24.0
[30] R.methodsS3_1.8.1
[31] biomaRt_2.52.0
[32] BiocManager_1.30.18
loaded via a namespace (and not attached):
[1] fs_1.5.2 bitops_1.0-7
[3] lubridate_1.8.0 bit64_4.0.5
[5] filelock_1.0.2 progress_1.2.2
[7] httr_1.4.3 backports_1.4.1
[9] utf8_1.2.2 R6_2.5.1
[11] DBI_1.1.2 withr_2.5.0
[13] tidyselect_1.1.2 prettyunits_1.1.1
[15] bit_4.0.4 curl_4.3.2
[17] compiler_4.2.0 rvest_1.0.2
[19] cli_3.3.0 xml2_1.3.3
[21] DelayedArray_0.22.0 scales_1.2.0
[23] genefilter_1.78.0 rappdirs_0.3.3
[25] digest_0.6.29 XVector_0.36.0
[27] pkgconfig_2.0.3 dbplyr_2.2.0
[29] fastmap_1.1.0 readxl_1.4.0
[31] rlang_1.0.2 rstudioapi_0.13
[33] RSQLite_2.2.14 generics_0.1.2
[35] jsonlite_1.8.0 BiocParallel_1.30.0
[37] RCurl_1.98-1.6 magrittr_2.0.3
[39] GenomeInfoDbData_1.2.8 Matrix_1.4-1
[41] Rcpp_1.0.8.3 munsell_0.5.0
[43] fansi_1.0.3 lifecycle_1.0.1
[45] stringi_1.7.6 zlibbioc_1.42.0
[47] BiocFileCache_2.4.0 grid_4.2.0
[49] blob_1.2.3 parallel_4.2.0
[51] crayon_1.5.1 lattice_0.20-45
[53] Biostrings_2.64.0 haven_2.5.0
[55] splines_4.2.0 annotate_1.74.0
[57] hms_1.1.1 KEGGREST_1.36.0
[59] locfit_1.5-9.5 pillar_1.7.0
[61] geneplotter_1.74.0 reprex_2.0.1
[63] XML_3.99-0.9 glue_1.6.2
[65] modelr_0.1.8 png_0.1-7
[67] vctrs_0.4.1 tzdb_0.3.0
[69] cellranger_1.1.0 gtable_0.3.0
[71] assertthat_0.2.1 cachem_1.0.6
[73] xtable_1.8-4 broom_0.8.0
[75] survival_3.3-1 memoise_2.0.1
[77] ellipsis_0.3.2

Hello Charles thank you for your response, unfortunately the output is the same, I get a dataframe of 0 obs. with 2 variables. It appear that the filter portion of the function is causing the problems.
You may also want to consider to make use of the
EnsDbannotation package.To extract some annotation info: (AFAIK the
columnSYMBOLis the MGI Symbol).I downloaded the
EnsDb(EnsDb.Mmusculus.v106) from theAnnotationHub.See for more on this e.g. here (ensembldb EnsDb databases for Ensembl release 101 added to AnnotationHub), and for some code to get you started: EnsDb.Rnorvegicus for Rnor6. You obviously have to adapt the code accordingly for your use case (i.e. for Mus musculus, and for v106).