Hello all,
I'm trying to map homologs with biomaRt, but running into an issue and I haven't found a specific solution on my search. I kept getting an error when using the mart that I'd like to use, but when using a second incompatible mart (sheep2), the function works fine.
No issue retrieving gene names and ids when using an old archive
sheep <- useMart("ensembl", dataset = "oaries_gene_ensembl", host="https://dec2021.archive.ensembl.org/")
sheep2 <- useMart("ensembl", dataset = "oarambouillet_gene_ensembl", host="https://dec2021.archive.ensembl.org/")
human <- useMart("ensembl", dataset = "hsapiens_gene_ensembl", host="https://dec2021.archive.ensembl.org/")
mapped_ids <- getLDS(attributes = c("ensembl_gene_id",'external_gene_name'),
filters = "ensembl_gene_id",
values = sheep_ensembl_ids,
mart = sheep,
attributesL = c("ensembl_gene_id",'external_gene_name','gene_biotype'),
martL = human)
Returns this error
Error in getLDS(attributes = c("ensembl_gene_id", "external_gene_name"), :
Query ERROR: caught BioMart::Exception::Query: returning undef ... missing attributes for your exportable?
I noticed this by accident when I used the wrong sheep gene ensembl.
mapped_ids2 <- getLDS(attributes = c("ensembl_gene_id",'external_gene_name'),
filters = "ensembl_gene_id",
values = sheep_ensembl_ids,
mart = sheep2,
attributesL = c("ensembl_gene_id",'external_gene_name','gene_biotype'),
martL = human)
The function goes through, although nothing is returned because my data uses the oaries_gene_ensembl ID. The both the human and second sheep ensembl were Large Mart objects, while my oaries_gene_ensembl is a "Formal Class Mart". I don't know if that matters.
For reproducibility you can use below, or remove the filter argument from getLDS.
sheep_ensembl_ids = c("ENSOARG00000000002", "ENSOARG00000000006", "ENSOARG00000000016", "ENSOARG00000000019", "ENSOARG00000000022")
Session Info
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: Australia/Brisbane
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.58.2
loaded via a namespace (and not attached):
[1] rappdirs_0.3.3 utf8_1.2.4 generics_0.1.3 bitops_1.0-7
[5] xml2_1.3.6 RSQLite_2.3.5 stringi_1.8.3 hms_1.1.3
[9] digest_0.6.34 magrittr_2.0.3 evaluate_0.23 fastmap_1.1.1
[13] blob_1.2.4 progress_1.2.3 AnnotationDbi_1.64.1 GenomeInfoDb_1.38.6
[17] DBI_1.2.1 BiocManager_1.30.22 httr_1.4.7 fansi_1.0.6
[21] XML_3.99-0.16.1 Biostrings_2.70.2 cli_3.6.1 rlang_1.1.3
[25] crayon_1.5.2 dbplyr_2.4.0 XVector_0.42.0 Biobase_2.62.0
[29] bit64_4.0.5 yaml_2.3.8 cachem_1.0.8 tools_4.3.2
[33] memoise_2.0.1 dplyr_1.1.4 GenomeInfoDbData_1.2.11 filelock_1.0.3
[37] BiocGenerics_0.48.1 curl_5.2.0 vctrs_0.6.5 R6_2.5.1
[41] png_0.1-8 stats4_4.3.2 lifecycle_1.0.4 BiocFileCache_2.10.1
[45] zlibbioc_1.48.0 KEGGREST_1.42.0 stringr_1.5.1 S4Vectors_0.40.2
[49] IRanges_2.36.0 bit_4.0.5 pkgconfig_2.0.3 pillar_1.9.0
[53] data.table_1.15.0 glue_1.7.0 xfun_0.42 tibble_3.2.1
[57] tidyselect_1.2.0 knitr_1.45 rstudioapi_0.15.0 htmltools_0.5.7
[61] rmarkdown_2.25 compiler_4.3.2 prettyunits_1.2.0 RCurl_1.98-1.14
Thank you! I haven't gotten too familar with the mart's structure so I wouldn't have figured that out easily. I'll try to follow up with Ensembl.