I am trying to use biomaRt with an archive version of Ensembl for reproducibility, since the project I am working on has specified a specific version of Ensembl (version 98). However, I am running into an error in getBM
that doesn't occur with the main Ensembl mart. Here is some example code to reproduce the error. I've used Ensembl version 100, which is the current version of Ensembl, to demonstrate that the error doesn't seem to have anything to do with the query or the data that it returns, since both getBM
calls in this example should return exactly the same thing. Substituting any other number with 100 produces the same result.
library(biomaRt)
mart_good <- useMart(biomart="ENSEMBL_MART_ENSEMBL", host="ensembl.org", dataset="hsapiens_gene_ensembl")
mart_bad <- useMart(biomart="ENSEMBL_MART_ENSEMBL", host="e100.ensembl.org", dataset="hsapiens_gene_ensembl")
# This works
getBM(attributes = "ensembl_gene_id", mart = mart_good,
filters = "ensembl_gene_id", values = "ENSG00000000003")
# This fails
getBM(attributes = "ensembl_gene_id", mart = mart_bad,
filters = "ensembl_gene_id", values = "ENSG00000000003")
When I run this code, the last line produces:
> getBM(attributes = "ensembl_gene_id", mart = mart_bad,
+ filters = "ensembl_gene_id", values = "ENSG00000000003")
NULL
Error in .processResults(postRes, mart = mart, sep = sep, fullXmlQuery = fullXmlQuery, :
The query to the BioMart webservice returned an invalid result.
The number of columns in the result table does not equal the number of attributes in the query.
Please report this on the support site at http://support.bioconductor.org
Can anyone tell me whether this is an issue with the package, or if I'm doing something wrong?
Session info:
> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grDevices datasets parallel graphics stats4 stats utils methods base
other attached packages:
[1] biomaRt_2.44.0 assertthat_0.2.1 DESeq2_1.28.1 SummarizedExperiment_1.18.1 DelayedArray_0.14.0
[6] matrixStats_0.56.0 Biobase_2.48.0 fs_1.4.1 tximport_1.16.0 here_0.1
[11] readr_1.3.1 conflicted_1.0.4 tidyr_1.1.0 devtools_2.3.0 usethis_1.6.1
[16] openxlsx_4.1.5 magrittr_1.5 dplyr_0.8.5 foreach_1.5.1 plyr_1.8.6
[21] glue_1.4.1 stringr_1.4.0 GenomicRanges_1.40.0 GenomeInfoDb_1.24.0 IRanges_2.22.1
[26] ggplot2_3.3.0 S4Vectors_0.26.1 BiocGenerics_0.34.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 bit64_0.9-8 httr_1.4.1 progress_1.2.2 RColorBrewer_1.1-2 rprojroot_1.3-2
[7] tools_4.0.0 backports_1.1.7 R6_2.4.1 DBI_1.1.0 colorspace_1.4-2 withr_2.2.0
[13] tidyselect_1.1.0 prettyunits_1.1.1 processx_3.4.2 curl_4.3 bit_1.1-15.2 compiler_4.0.0
[19] cli_2.0.2 desc_1.2.0 scales_1.1.1 genefilter_1.70.0 askpass_1.1 callr_3.4.3
[25] rappdirs_0.3.1 digest_0.6.25 rmarkdown_2.1 XVector_0.28.0 base64enc_0.1-3 htmltools_0.4.0
[31] pkgconfig_2.0.3 sessioninfo_1.1.1 dbplyr_1.4.3 rlang_0.4.6 rstudioapi_0.11 RSQLite_2.2.0
[37] jsonlite_1.6.1 BiocParallel_1.22.0 zip_2.0.4 RCurl_1.98-1.2 GenomeInfoDbData_1.2.3 Matrix_1.3-0
[43] Rcpp_1.0.4.6 munsell_0.5.0 fansi_0.4.1 lifecycle_0.2.0 stringi_1.4.6 yaml_2.2.1
[49] zlibbioc_1.34.0 BiocFileCache_1.12.0 pkgbuild_1.0.8 grid_4.0.0 blob_1.2.1 crayon_1.3.4
[55] lattice_0.20-41 splines_4.0.0 annotate_1.66.0 hms_0.5.3 locfit_1.5-9.4 knitr_1.28
[61] ps_1.3.3 pillar_1.4.4 geneplotter_1.66.0 codetools_0.2-16 pkgload_1.0.2 XML_3.99-0.3
[67] evaluate_0.14 remotes_2.1.1 vctrs_0.3.0 testthat_2.3.2 openssl_1.4.1 gtable_0.3.0
[73] purrr_0.3.4 xfun_0.14 xtable_1.8-5 survival_3.1-12 tibble_3.0.1 iterators_1.0.12
[79] AnnotationDbi_1.50.0 memoise_1.1.0 ellipsis_0.3.1
The e[VERSION].ensembl.org hostname format is what I've used successfully in the past with biomaRt, e.g. here, which calls to here. If you type e98.ensembl.org into your browser, it redirects properly to sep2019.archive.ensembl.org. Is this no longer supported in biomaRt?
I don't know. I've never relied on the redirect. But not relying on it seems to work, so...
My suggestion would be to try
useEnsembl(..., version = 98)
instead ofuseMart()
Ok, this is probably what I'm looking for. I'll try it out.