Issue with Biomart- cannot access ensembl
1
0
Entering edit mode
RJW63 • 0
@rjw63-23367
Last seen 4.7 years ago

I have searched and found that others have had a similar issue like this before. So, I have tried the solution by using a different mirror (in this case useast), but I still keep getting the same error. This script always worked in the past, but hasn't been working for the last two days for me.

ensembl <- useDataset(dataset = "hsapiensgeneensembl", mart = useEnsembl(biomart = 'ENSEMBLMARTENSEMBL', mirror = "useast"))

geneidlist <- rownames(resLFC.df)

ensembl.translate <- getBM(attributes = c("ensemblgeneid", "externalgenename", "genebiotype"), filters = "ensemblgene_id", values = geneidlist, mart = ensembl)

I keep getting this error:

Batch submitting query [========>---------------------------------------------------] 14% eta: 19sError in getBM(attributes = c("ensemblgeneid", "externalgenename", : The query to the BioMart webservice returned an invalid result: biomaRt expected a character string of length 1. Please report this on the support site at http://support.bioconductor.org

sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Red Hat Enterprise Linux Server 7.7 (Maipo)

Matrix products: default BLAS: /usr/lib64/libblas.so.3.4.2 LAPACK: /usr/lib64/liblapack.so.3.4.2

locale: [1] LCCTYPE=enUS.UTF-8 LCNUMERIC=C LCTIME=enUS.UTF-8
[4] LC
COLLATE=enUS.UTF-8 LCMONETARY=enUS.UTF-8 LCMESSAGES=enUS.UTF-8
[7] LC
PAPER=enUS.UTF-8 LCNAME=C LCADDRESS=C
[10] LC
TELEPHONE=C LCMEASUREMENT=enUS.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] devtools2.2.2 usethis1.5.1 RColorBrewer1.1-2
[4] gplots
3.0.3 ggrepel0.8.2 ggplot23.3.0
[7] DESeq21.24.0 SummarizedExperiment1.14.1 DelayedArray0.10.0
[10] BiocParallel
1.18.1 matrixStats0.56.0 Biobase2.44.0
[13] GenomicRanges1.36.1 GenomeInfoDb1.20.0 IRanges2.18.3
[16] S4Vectors
0.22.1 BiocGenerics0.30.0 tximport1.12.3
[19] apeglm1.6.0 pheatmap1.0.12 data.table1.12.8
[22] R.utils
2.9.2 R.oo1.23.0 R.methodsS31.8.0
[25] pacman0.5.1 biomaRt2.40.5

loaded via a namespace (and not attached): [1] colorspace1.4-1 ellipsis0.3.0 rprojroot1.3-2 htmlTable1.13.3
[5] XVector0.24.0 base64enc0.1-3 fs1.3.2 rstudioapi0.11
[9] farver2.0.3 remotes2.1.1 bit640.9-7 AnnotationDbi1.46.1
[13] fansi0.4.1 mvtnorm1.1-0 xml21.2.5 splines3.6.0
[17] geneplotter1.62.0 knitr1.28 pkgload1.0.2 zeallot0.1.0
[21] jsonlite1.6.1 Formula1.2-3 annotate1.62.0 cluster2.1.0
[25] png0.1-7 readr1.3.1 compiler3.6.0 httr1.4.1
[29] backports1.1.5 assertthat0.2.1 Matrix1.2-18 cli2.0.2
[33] acepack1.4.1 htmltools0.4.0 prettyunits1.1.1 tools3.6.0
[37] coda0.19-3 gtable0.3.0 glue1.3.2 GenomeInfoDbData1.2.1 [41] dplyr0.8.5 Rcpp1.0.3 bbmle1.0.23.1 vctrs0.2.1
[45] gdata2.18.0 xfun0.12 stringr1.4.0 ps1.3.2
[49] testthat2.3.2 lifecycle0.2.0 gtools3.8.1 XML3.99-0.3
[53] zlibbioc1.30.0 MASS7.3-51.5 scales1.1.0 hms0.5.3
[57] yaml2.2.1 curl4.3 memoise1.1.0 gridExtra2.3
[61] emdbook1.3.12 bdsmatrix1.3-4 rpart4.1-15 latticeExtra0.6-29
[65] stringi1.4.6 RSQLite2.2.0 genefilter1.66.0 desc1.2.0
[69] checkmate2.0.0 caTools1.18.0 pkgbuild1.0.6 rlang0.4.5
[73] pkgconfig2.0.3 bitops1.0-6 lattice0.20-40 purrr0.3.3
[77] labeling0.3 htmlwidgets1.5.1 processx3.4.2 bit1.1-15.2
[81] tidyselect1.0.0 plyr1.8.6 magrittr1.5 R62.4.1
[85] Hmisc4.4-0 DBI1.1.0 pillar1.4.3 foreign0.8-76
[89] withr2.1.2 survival3.1-11 RCurl1.98-1.1 nnet7.3-13
[93] tibble2.1.3 crayon1.3.4 KernSmooth2.23-16 jpeg0.1-8.1
[97] progress1.2.2 locfit1.5-9.4 grid3.6.0 callr3.4.3
[101] blob1.2.1 digest0.6.25 xtable1.8-4 numDeriv2016.8-1.1
[105] munsell0.5.0 sessioninfo1.1.1

biomaRT • 2.1k views
ADD COMMENT
1
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 2 days ago
EMBL Heidelberg

If you didn't change anything and the code worked previously then it's almost certainly a problem with the Ensembl server. If none of the mirror sites are working any better then there is not much you can do to improve the performance except try at a different time.

If you're based in the US then there's a chance that it would have been using the useast mirror already (by default it picks the geographically closest mirror). You can force it to use the main site by providing mirror = 'www' in case that works better.

This issue has been happening for a while, and so I now when you get the "batch submitting" message biomaRt should also cache blocks of interim results. It won't show them too you, but if you run the exact query again it will read anything it's already got from the server from your local disk, and the progress bar will always get at least as far as it did last time. This was added so that you should always be able to eventually complete long queries, even if the server is being unstable.

If that's not what it's doing in this case please let me know.

ADD COMMENT
0
Entering edit mode

I have been getting this error recently, too, but only when submitting large queries with filters and values activated. It seems that Ensembl may [for now] not like large queries like that; however, it is just as easy to not use filters and values and still get the annotation that you need. For example, simply obtaining the entire table from Ensembl seems easier than a lookup of specific values:

For example, this will likely allow you to retrieve data quicker, and then use the entire table as a lookup:

require(biomaRt)
mart <- useMart('ensembl', dataset = 'hsapiens_gene_ensembl', host = 'useast.ensembl.org')
annot <- getBM(
  mart = mart,
  attributes = c('ensembl_gene_id','hgnc_symbol','gene_biotype'))

...as opposed to a focused lookup:

require(biomaRt)
mart <- useMart('ensembl', dataset = 'hsapiens_gene_ensembl', host = 'useast.ensembl.org')
annot <- getBM(
  values = genes, # vector of >19000 Ensembl gene IDs
  filters = 'ensembl_gene_id',
  mart = mart,
  attributes = c('ensembl_gene_id','hgnc_symbol','gene_biotype'))
ADD REPLY
0
Entering edit mode

Hello, thank you for this useful answer. I am trying to get into Ensembl and it seems to be completely down, as when I use mirror = 'www' this message appears:

Ensembl site unresponsive, trying useast mirror
Ensembl site unresponsive, trying asia mirror
Error in .chooseEnsemblMirror(mirror = mirror, httr_config = httr_config) : 
  Unable to query any Ensembl site

This is the first time I experience this. My problem in the past has been dealing with the curl timeout, giving errors like:

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: [www.ensembl.org:443] Operation timed out after 300004 milliseconds with 4728314 bytes received

Now the biggest problem is, I have a 20000 entries query, which I am manually cutting to 3000 entry pieces (without looping). I have been trying this for days, and always either one or the other errors above appear. If I run only 10 entries everything works properly...

I have cut out all attributes to retrieve and kept only the basics (IDs and coding sequences).

Can you help?

ADD REPLY
0
Entering edit mode

Please don't add comments to very old posts. In the future, just ask a new question.

You shouldn't have to manually cut things into pieces, as biomaRt will happily do that for you. You might need to increase your timeout though (options(timeout = 1e5)). As for connecting to a Biomart server, you could use this dumb function I just wrote for a script, to protect against connection problems.

getMart <- function(biomart, species) {
    e <- simpleError("")
    while(is(e, "simpleError")) {
        e <- tryCatch(useEnsembl(biomart, species), error = function(x) x)
    }
    e
}
## which you use like this, substituting your species if not human
mart <- getMart("ensembl","hsapiens_gene_ensembl")

It will just keep trying until it gets a connection. Dumb but effective.

ADD REPLY

Login before adding your answer.

Traffic: 612 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6