Hello I am having a problem with matching my ENSEMBL ID's to the corresponding gene symbol. When I try to use the filter "ensembl_gene_id_version" I only get an output of 11 corresponding gene symbols. When I use no filter I get an output of around 50,000 gene symbols. Which is confusing considering I have only 15,000 ENSEMBL ID's. This creates a problem for me when I try to merge the counts data frame with the gene symbol output from getBM. I have tried different filters, I am using the most up to data version of the dataset. Code should be placed in three backticks as shown below
mart<- useMart(biomart ='ensembl',
dataset = 'mmusculus_gene_ensembl',
host='useast.ensembl.org')
all_coding_genes<- getBM(attributes=c("mgi_symbol"),values= row.names(res_ordered), filters = "ensembl_gene_id_version", mart= mart, uniqueRows = TRUE)
include your problematic code here with any corresponding output
please also include the results of running the following in an R session
sessionInfo( )
```> sessionInfo() R version 4.2.0 (2022-04-22 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22000)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages: [1] tools stats4 stats graphics grDevices [6] utils datasets methods base
other attached packages:
[1] fuzzyjoin_0.1.6
[2] org.Mm.eg.db_3.15.0
[3] AnnotationDbi_1.58.0
[4] RColorBrewer_1.1-3
[5] pheatmap_1.0.12
[6] colorspace_2.0-3
[7] EnhancedVolcano_1.14.0
[8] ggrepel_0.9.1
[9] forcats_0.5.1
[10] stringr_1.4.0
[11] purrr_0.3.4
[12] readr_2.1.2
[13] tidyr_1.2.0
[14] ggplot2_3.3.5
[15] tidyverse_1.3.1
[16] DESeq2_1.36.0
[17] SummarizedExperiment_1.26.1
[18] Biobase_2.56.0
[19] MatrixGenerics_1.8.0
[20] matrixStats_0.62.0
[21] GenomicRanges_1.48.0
[22] GenomeInfoDb_1.32.1
[23] IRanges_2.30.0
[24] S4Vectors_0.34.0
[25] BiocGenerics_0.42.0
[26] tibble_3.1.6
[27] dplyr_1.0.8
[28] R.utils_2.11.0
[29] R.oo_1.24.0
[30] R.methodsS3_1.8.1
[31] biomaRt_2.52.0
[32] BiocManager_1.30.18
loaded via a namespace (and not attached):
[1] fs_1.5.2 bitops_1.0-7
[3] lubridate_1.8.0 bit64_4.0.5
[5] filelock_1.0.2 progress_1.2.2
[7] httr_1.4.3 backports_1.4.1
[9] utf8_1.2.2 R6_2.5.1
[11] DBI_1.1.2 withr_2.5.0
[13] tidyselect_1.1.2 prettyunits_1.1.1
[15] bit_4.0.4 curl_4.3.2
[17] compiler_4.2.0 rvest_1.0.2
[19] cli_3.3.0 xml2_1.3.3
[21] DelayedArray_0.22.0 scales_1.2.0
[23] genefilter_1.78.0 rappdirs_0.3.3
[25] digest_0.6.29 XVector_0.36.0
[27] pkgconfig_2.0.3 dbplyr_2.2.0
[29] fastmap_1.1.0 readxl_1.4.0
[31] rlang_1.0.2 rstudioapi_0.13
[33] RSQLite_2.2.14 generics_0.1.2
[35] jsonlite_1.8.0 BiocParallel_1.30.0
[37] RCurl_1.98-1.6 magrittr_2.0.3
[39] GenomeInfoDbData_1.2.8 Matrix_1.4-1
[41] Rcpp_1.0.8.3 munsell_0.5.0
[43] fansi_1.0.3 lifecycle_1.0.1
[45] stringi_1.7.6 zlibbioc_1.42.0
[47] BiocFileCache_2.4.0 grid_4.2.0
[49] blob_1.2.3 parallel_4.2.0
[51] crayon_1.5.1 lattice_0.20-45
[53] Biostrings_2.64.0 haven_2.5.0
[55] splines_4.2.0 annotate_1.74.0
[57] hms_1.1.1 KEGGREST_1.36.0
[59] locfit_1.5-9.5 pillar_1.7.0
[61] geneplotter_1.74.0 reprex_2.0.1
[63] XML_3.99-0.9 glue_1.6.2
[65] modelr_0.1.8 png_0.1-7
[67] vctrs_0.4.1 tzdb_0.3.0
[69] cellranger_1.1.0 gtable_0.3.0
[71] assertthat_0.2.1 cachem_1.0.6
[73] xtable_1.8-4 broom_0.8.0
[75] survival_3.3-1 memoise_2.0.1
[77] ellipsis_0.3.2
Hello Charles thank you for your response, unfortunately the output is the same, I get a dataframe of 0 obs. with 2 variables. It appear that the filter portion of the function is causing the problems.
You may also want to consider to make use of the
EnsDb
annotation package.To extract some annotation info: (AFAIK the
column
SYMBOL
is the MGI Symbol).I downloaded the
EnsDb
(EnsDb.Mmusculus.v106
) from theAnnotationHub
.See for more on this e.g. here (ensembldb EnsDb databases for Ensembl release 101 added to AnnotationHub), and for some code to get you started: EnsDb.Rnorvegicus for Rnor6. You obviously have to adapt the code accordingly for your use case (i.e. for Mus musculus, and for v106).