Question: biomaRt error: Query ERROR: caught BioMart::Exception: non-BioMart die():
gravatar for Georg Otto
11 months ago by
Georg Otto120
United Kingdom
Georg Otto120 wrote:


I have a vector with Ensembl gene IDs

[1] "ENSG00000223972" "ENSG00000227232" "ENSG00000278267" "ENSG00000243485"
[5] "ENSG00000274890" "ENSG00000237613"

I am trying to annotate the IDs using biomaRt

> library(biomaRt)

> ensembl <- useMart("ENSEMBL_MART_ENSEMBL",
                   dataset = "hsapiens_gene_ensembl",
                   host = "")

However, I get an error. The curious thing is that I get this error only when my vector has length 993 or longer, never when it is shorter, using a random selection of IDs. So this always works:

> mat.cpm.annot <- biomaRt::getBM(attributes = c("ensembl_gene_id", "hgnc_id", "hgnc_symbol", "description"), filter = "ensembl_gene_id",, 992), mart = ensembl, uniqueRows = TRUE)

And this gives me an error:

> mat.cpm.annot <- biomaRt::getBM(attributes = c("ensembl_gene_id", "hgnc_id", "hgnc_symbol", "description"), filter = "ensembl_gene_id",, 993), mart = ensembl, uniqueRows = TRUE)

Error in biomaRt::getBM(attributes = c("ensembl_gene_id", "hgnc_id", "hgnc_symbol",  :
  Query ERROR: caught BioMart::Exception: non-BioMart die():
not well-formed (invalid token) at line 1, column 16292, byte 16292 at /nfs/public/release/ensweb-software/sharedsw/2017_04_03/linuxbrew/Cellar/perl/5.24.1/lib/perl5/site_perl/5.24.1/x86_64-linux-thread-multi/XML/ line 187.
XML::Simple called at /nfs/public/release/ensweb/latest/live/mart/www_90/biomart-perl/lib/BioMart/ line 1935.


> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.5 (Final)

Matrix products: default
BLAS: /share/apps/cto/packages/R/3.4.2/lib64/R/lib/
LAPACK: /share/apps/cto/packages/R/3.4.2/lib64/R/lib/

 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biomaRt_2.32.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.13         IRanges_2.10.5       XML_3.98-1.9        
 [4] digest_0.6.12        bitops_1.0-6         DBI_0.7             
 [7] stats4_3.4.2         RSQLite_2.0          rlang_0.1.2         
[10] blob_1.1.0           S4Vectors_0.14.7     tools_3.4.2         
[13] bit64_0.9-7          Biobase_2.36.2       RCurl_1.95-4.8      
[16] bit_1.1-12           parallel_3.4.2       compiler_3.4.2      
[19] BiocGenerics_0.22.1  AnnotationDbi_1.38.2 memoise_1.1.0       
[22] tibble_1.3.4        


Any idea, what is going on?







ADD COMMENTlink modified 11 months ago • written 11 months ago by Georg Otto120
gravatar for Mike Smith
11 months ago by
Mike Smith2.9k
EMBL Heidelberg / de.NBI
Mike Smith2.9k wrote:

That's a new error to me!  I suspect that something is wrong with the back end database, rather than with the biomaRt package.

One thing you can try is to use one of the mirror services, to see if that is unaffected, e.g:

ensembl <- useMart("ENSEMBL_MART_ENSEMBL",
                   dataset = "hsapiens_gene_ensembl",
                   host = "")

Alternatively, you can try the developmental version of biomaRt.  It's not recommended to run queries with more than 500 search values, and although in practice it's often fine occasionally results won't be returned, but you'll have no idea that's happened.  The devel package has a modification that breaks your query down into chunks of 500 and runs the independently and then splices the results back together.  Since your issue seems so deterministic perhaps this modification will be sufficient.  You can install using:


A quick test for me suggests the uniqueRows argument won't work properly at the moment, but you can do it in post processing yourself.

ADD COMMENTlink written 11 months ago by Mike Smith2.9k

Those answers are still valid, but I want to add that I don't experience the problem you're seeing, so maybe it has already been fixed at the Ensembl side.

ADD REPLYlink written 11 months ago by Mike Smith2.9k

Thanks a lot. I tried both suggested solutions. With the mirror service I got the same error. Installing and using the devel package however made the error go away. Just to clarify: The recommendation not to run querys with more than 500 search values relates to the devel package, not the release package, right? I routinely used biomaRt to run queries of thousands of search values.

ADD REPLYlink written 11 months ago by Georg Otto120

The 500 values has always applied to the queries sent to BioMart, either via biomaRt or using the Ensembl web interface.  For the most part you can submit more than 500 filter values and it will be fine, but if there is a problem you won't know anything about it - it happens silently. 

This is obviously really undesirably, hence the patch.  I only commited this to the devel branch incase it broke some other functionality, but noone has reported anything, and it's now part of the new release branch that was released this week.

If you are submitting queries with thousands of gene IDs or the like you should definitely be using biomaRt version 2.33.1 or newer just to be on the safe side. 

ADD REPLYlink written 11 months ago by Mike Smith2.9k
gravatar for Georg Otto
11 months ago by
Georg Otto120
United Kingdom
Georg Otto120 wrote:

I can confirm upgrading bioconductor to version 3.6. solved the problem.

ADD COMMENTlink written 11 months ago by Georg Otto120
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 230 users visited in the last hour