Search
Question: biomaRt error: Query ERROR: caught BioMart::Exception: non-BioMart die():
0
8 months ago by
Georg Otto120
United Kingdom
Georg Otto120 wrote:

Hi,

I have a vector with Ensembl gene IDs

[1] "ENSG00000223972" "ENSG00000227232" "ENSG00000278267" "ENSG00000243485"
[5] "ENSG00000274890" "ENSG00000237613"

I am trying to annotate the IDs using biomaRt

> library(biomaRt)

> ensembl <- useMart("ENSEMBL_MART_ENSEMBL",
dataset = "hsapiens_gene_ensembl",
host = "www.ensembl.org")

However, I get an error. The curious thing is that I get this error only when my vector has length 993 or longer, never when it is shorter, using a random selection of IDs. So this always works:

> mat.cpm.annot <- biomaRt::getBM(attributes = c("ensembl_gene_id", "hgnc_id", "hgnc_symbol", "description"), filter = "ensembl_gene_id", samplegene.id, 992), mart = ensembl, uniqueRows = TRUE)

And this gives me an error:

> mat.cpm.annot <- biomaRt::getBM(attributes = c("ensembl_gene_id", "hgnc_id", "hgnc_symbol", "description"), filter = "ensembl_gene_id", samplegene.id, 993), mart = ensembl, uniqueRows = TRUE)

Error in biomaRt::getBM(attributes = c("ensembl_gene_id", "hgnc_id", "hgnc_symbol",  :
Query ERROR: caught BioMart::Exception: non-BioMart die():
not well-formed (invalid token) at line 1, column 16292, byte 16292 at /nfs/public/release/ensweb-software/sharedsw/2017_04_03/linuxbrew/Cellar/perl/5.24.1/lib/perl5/site_perl/5.24.1/x86_64-linux-thread-multi/XML/Parser.pm line 187.
XML::Simple called at /nfs/public/release/ensweb/latest/live/mart/www_90/biomart-perl/lib/BioMart/Query.pm line 1935.

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.5 (Final)

Matrix products: default
BLAS: /share/apps/cto/packages/R/3.4.2/lib64/R/lib/libRblas.so
LAPACK: /share/apps/cto/packages/R/3.4.2/lib64/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] biomaRt_2.32.1

loaded via a namespace (and not attached):
[1] Rcpp_0.12.13         IRanges_2.10.5       XML_3.98-1.9
[4] digest_0.6.12        bitops_1.0-6         DBI_0.7
[7] stats4_3.4.2         RSQLite_2.0          rlang_0.1.2
[10] blob_1.1.0           S4Vectors_0.14.7     tools_3.4.2
[13] bit64_0.9-7          Biobase_2.36.2       RCurl_1.95-4.8
[16] bit_1.1-12           parallel_3.4.2       compiler_3.4.2
[19] BiocGenerics_0.22.1  AnnotationDbi_1.38.2 memoise_1.1.0
[22] tibble_1.3.4

Any idea, what is going on?

Cheers,

Georg

modified 8 months ago • written 8 months ago by Georg Otto120
2
8 months ago by
Mike Smith2.8k
EMBL Heidelberg / de.NBI
Mike Smith2.8k wrote:

That's a new error to me!  I suspect that something is wrong with the back end database, rather than with the biomaRt package.

One thing you can try is to use one of the mirror services, to see if that is unaffected, e.g:

ensembl <- useMart("ENSEMBL_MART_ENSEMBL",
dataset = "hsapiens_gene_ensembl",
host = "asia.ensembl.org")

Alternatively, you can try the developmental version of biomaRt.  It's not recommended to run queries with more than 500 search values, and although in practice it's often fine occasionally results won't be returned, but you'll have no idea that's happened.  The devel package has a modification that breaks your query down into chunks of 500 and runs the independently and then splices the results back together.  Since your issue seems so deterministic perhaps this modification will be sufficient.  You can install using:

BiocInstaller::biocLite('grimbough/biomaRt')

A quick test for me suggests the uniqueRows argument won't work properly at the moment, but you can do it in post processing yourself.

Those answers are still valid, but I want to add that I don't experience the problem you're seeing, so maybe it has already been fixed at the Ensembl side.

Thanks a lot. I tried both suggested solutions. With the mirror service I got the same error. Installing and using the devel package however made the error go away. Just to clarify: The recommendation not to run querys with more than 500 search values relates to the devel package, not the release package, right? I routinely used biomaRt to run queries of thousands of search values.

2

The 500 values has always applied to the queries sent to BioMart, either via biomaRt or using the Ensembl web interface.  For the most part you can submit more than 500 filter values and it will be fine, but if there is a problem you won't know anything about it - it happens silently.

This is obviously really undesirably, hence the patch.  I only commited this to the devel branch incase it broke some other functionality, but noone has reported anything, and it's now part of the new release branch that was released this week.

If you are submitting queries with thousands of gene IDs or the like you should definitely be using biomaRt version 2.33.1 or newer just to be on the safe side.

0
8 months ago by
Georg Otto120
United Kingdom
Georg Otto120 wrote:

I can confirm upgrading bioconductor to version 3.6. solved the problem.