Question: biomaRt - getBM - biomart= ENSEMBL_MART_FUNCGEN !!! NOT WORKING!!!
0
3.4 years ago by
France/Paris/Institut Pasteur

Dear all,

i having troubles since 2 weeks with the biomaRt function getBM or useMart. Before i did it like this and i get the info, please i want to know if it is server problem or im doing really something wrong.

Thank you.

I will explain my self to connect to biomart i've been using..

Ensembl_Fun <- useMart(biomart = "ENSEMBL_MART_FUNCGEN",
dataset="hsapiens_annotated_feature",
host = "sep2015.archive.ensembl.org")

and.... (i tried win other way too)

Ensembl_Fun <- useMart(biomart="ENSEMBL_MART_FUNCGEN"
host="grch37.ensembl.org",
path="/biomart/martservice",
dataset="hsapiens_gene_ensembl")

PD: i was asking for different "datasets", but i have troubles for all of them ("hsapiens_annotated_feature","hsapiens_motif_feature","hsapiens_regulatory_feature","hsapiens_mirna_target_feature","hsapiens_segmentation_feature","hsapiens_external_feature")

Them the attributes im asking for... (i know for each case is different but is an example)

Annotation_Ensembl_Fun <- getBM(attributes = c("chromosome_name",
"chromosome_start",
"chromosome_end",
"feature_type_name",
"feature_type_class",
"feature_type_description",
"cell_type_name",
"so_accession"),
filters = "chromosomal_region",
values = Chr_Region,
mart = Ensembl_Fun)

And each time i run (it take like 20mins), i got:

Error in value[[3L]](cond) :
Request to BioMart web service failed. Verify if you are still connected to the internet.  Alternatively the BioMart web service is temporarily down.

Here i put the SESSION INFO:

sessionInfo()
R version 3.2.4 (2016-03-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.5 (El Capitan)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] reshape_0.8.5  ggplot2_2.1.0  biomaRt_2.26.1

loaded via a namespace (and not attached):
[1] Rcpp_0.12.5          IRanges_2.4.8        XML_3.98-1.4         bitops_1.0-6         grid_3.2.4
[6] plyr_1.8.3           gtable_0.2.0         DBI_0.4-1            stats4_3.2.4         RSQLite_1.0.0
[11] scales_0.4.0         S4Vectors_0.8.11     labeling_0.3         tools_3.2.4          Biobase_2.30.0
[16] munsell_0.4.3        RCurl_1.95-4.8       parallel_3.2.4       BiocGenerics_0.16.1  AnnotationDbi_1.32.3
[21] colorspace_1.2-6

and the TRACE BACK....

traceback()
6: stop("Request to BioMart web service failed. Verify if you are still connected to the internet.  Alternatively the BioMart web service is temporarily down.")
5: value[[3L]](cond)
4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
3: tryCatchList(expr, classes, parentenv, handlers)
2: tryCatch(postForm(paste(martHost(mart), "?", sep = ""), query = xmlQuery),
error = function(e) {
stop("Request to BioMart web service failed. Verify if you are still connected to the internet.  Alternatively the BioMart web service is temporarily down.")
})
1: getBM(attributes = c("chromosome_name", "chromosome_start", "chromosome_end",
"feature_type_name", "feature_type_class", "feature_type_description",
"cell_type_name", "so_accession"), filters = "chromosomal_region",
values = Chr_Region, mart = Ensembl_Fun)

Thanks a lot,

modified 3.4 years ago by Dan Staines70 • written 3.4 years ago by juan-pablo.cerapio-arroyo10

If you want people to be able to help, you need to include a self-contained example that shows the problem you are seeing. As it stands, nobody can reproduce your results, because nobody knows what Chr_region is. Ideally you would subset whatever that is down to something that will cause the problem you are seeing in much less time than 20 minutes, because nobody is going to wait 20 minutes to help somebody they don't know.

Answer: biomaRt - getBM - biomart= ENSEMBL_MART_FUNCGEN !!! NOT WORKING!!!
2
3.4 years ago by
Dan Staines70
EMBL-EBI
Dan Staines70 wrote:

A follow up on this. I can confirm that the main BioMart at ensembl.org, including the web interface and martservice (which I believe biomaRt uses), works fine for the previous example. Chromosome 1 takes approximately 10 minutes to download due to the sheer volume of data, but the Y chromosome takes less than a second - I get similar results using biomaRt.

However, queries on individual regions (e.g. 1:100:100000) seem to be quite slow. I'll try and investigate why this is happening, though it would be very useful to know the volume and size of the regions you're using for queries.

Thanks,

Dan.

Answer: biomaRt - getBM - biomart= ENSEMBL_MART_FUNCGEN !!! NOT WORKING!!!
1
3.4 years ago by
Dan Staines70
EMBL-EBI
Dan Staines70 wrote:

Hi Juan-Pablo,

I'd like to echo James's comments here - a self-contained example would be a great help in figuring out how to help you here. In particular, knowing what Chr_Region is set to would be very helpful. If we can also know what your use-case is, it would be very helpful to advise you on the best way to get the data you need.

Having said that, can I ask a couple more questions? Firstly, is there any reason why you're using the September archive or the GRCh37 instance rather than ensembl.org? Secondly, if this was working before, can you confirm the rough runtime and the number of rows returned by your successful script.

However, I can confirm that the biomart instance at ensembl.org is behaving as expected and we're not seeing any unusual load on the mart server. If I use the biomart web interface to query hsapiens_annotated_feature with the filters and attributes in your example using chromosome 1, the full results are returned in a minute or two (though are over 500M and nearly 4 million rows). This is the URL if you can confirm this is a query you're having problems with:

http://www.ensembl.org/biomart/martview/f7bfbf194e6d312846ab46579f8bbe49?VIRTUALSCHEMANAME=default&ATTRIBUTES=hsapiens_annotated_feature.default.annotated_feature.chromosome_name|hsapiens_annotated_feature.default.annotated_feature.chromosome_start|hsapiens_annotated_feature.default.annotated_feature.chromosome_end|hsapiens_annotated_feature.default.annotated_feature.feature_type_name|hsapiens_annotated_feature.default.annotated_feature.feature_type_class|hsapiens_annotated_feature.default.annotated_feature.so_accession|hsapiens_annotated_feature.default.annotated_feature.feature_type_description|hsapiens_annotated_feature.default.annotated_feature.cell_type_name|hsapiens_annotated_feature.default.annotated_feature.cell_type_description&FILTERS=hsapiens_annotated_feature.default.filters.chromosome_name."1"&VISIBLEPANEL=resultspanel

Thanks,

Dan.

Hi Dan,

thanks for taking the time to take a look of this.

fireable im using specifically September archive or the GRCh37 because a friend who works with biomaRt currently told me that some times it get slow with the latest version when it is updating, so thats why i tried with this 2 options.

here an example of my regions, and they are around of 600.

"1:50513644:50514320"    "1:156593657:156595283"  "1:171810467:171811325"
"1:71512224:71513804"    "1:18969548:18970739"    "1:92945907:92952609"    "1:40253683:40255172"
"1:41283840:41284591"    "1:6208716:6209039"      "1:53308294:53309262"    "1:119529819:119530712"
"1:181452706:181453073"  "1:184942862:184943777"  "1:1181756:1182470"      "1:27560705:27561707"
"2:105468851:105473488"  "2:105468851:105473488"  "2:63274475:63279430"    "2:63274475:63279430"
"2:63274475:63279430"    "2:63274475:63279430"    "2:175199463:175202639"

I tried already getting the info without filter and im getting nothing too, so im not sure what is going on.

I CONFIRM you that the query i was using is that you post, but the filter are like this and around 600.

Thank you,

Jp

1

OK, thanks for the extra detail. I've identified the problem (which just affects region queries like this) - the queries will work, but are _very_ slow. I'm also puzzled that this has ever worked in a performant manner! There are a number of solutions, but I need to discuss the best approach with a colleague who is away from the office for the week. I hope this is ok for you - I'll keep you posted in case this changes.

Dan.

Thanks for looking into this Dan.  Is this an issue with the biomaRt package, or something inefficient server side?

It looks like a problem on the server side just for this particular use-case (region queries) - otherwise things seem to be working as expected on the web interface and with martservice.

Hi Dan,

ok so im not doing nothing wrong? so... i shouldd what or ask to them to take a look of it ??? because im already pass 2 weeks trying this !!

I really thanks a lot for the help.

Jp

Does each query use just one region, or multiple regions at once?

Multiple at the same time

Jp

1

I'm going to look into a fix at my end, but it might not be possible for at least a few days. In the meantime, the queries do work, but are slower than is ideal - one region query of this kind takes about 20-30s, but will complete without biomaRt timing out. Therefore, you could modify your script to do one region query at a time.

Thanks Dan,

ill have around 600 regions so... i will try to get ... but in the mean time i will wait news for you to know if you could fix it.

Thank you very much.

Have a nice day.

1
Hi Juan, Sorry you're having problems with Ensembl BioMart. Unfortunately the performance does get affected when requesting features on high of regions, this is something that we are aware of and are actively trying to fix as Dan pointed out. Our benchmarking so far has shown that a limit of 500 is the highest number of regions to specify to get the results back in a reasonable time. Hence, a maximum of 500 regions is also our recommendation to our users. I would suggest that you split your query in to sets of 300 regions or less for faster response. Cheers, Amonida -- Amonida Zadissa Ensembl Production - QC EMBL-EBI Hinxton England On 07/06/2016 18:49, juan-pablo.cerapio-arroyo [bioc] wrote: > Activity on a post you are following on support.bioconductor.org > <https: support.bioconductor.org=""> > > User juan-pablo.cerapio-arroyo <https: support.bioconductor.org="" u="" 10842=""/> wrote > Comment: biomaRt - getBM - biomart= ENSEMBL_MART_FUNCGEN !!! NOT WORKING!!! > <https: support.bioconductor.org="" p="" 83441="" #83509="">: > > Thanks Dan, > > ill have around 600 regions so... i will try to get ... but in the mean time i > will wait news for you to know if you could fix it. > > Thank you very much. > > Have a nice day. > > -------------------------------------------------------------------------------- > > Post tags: bioconductor, biomart, problem with connection > > You may reply via email or visit C: biomaRt - getBM - biomart= ENSEMBL_MART_FUNCGEN !!! NOT WORKING!!! >

Thank you very much for the fast answers !!! i will try this way i hope you could find a solution.

Juan Pablo Cerapio

​i wasn't at my lab when we were talking about this problems of BioMart, i tried with less filters 7 days ago when i was out of my lab and it works, with 100 regions; but now it is no working, no even gif i reduce the number of filter until 10.

Thanks,

Juan Pablo

So, to confirm - this was working 7 days ago (9th June), but wasn't working yesterday? Can you be more specific with the time of day when it wasn't working? Also, if you've changed location and it stops working I'd strongly suggest looking at your local proxy setup. Thanks, Dan.

Hello Dan,

After the comment from Amande i tried with just 100 regions and it works, take no so long but works. I was in Lyon France at the time. Now im in Paris, i don't know if it could be the internet connection. I was using wifi in Lyon and here im using an cable connected to the computer.

I should modify the proxy set up?? It works before in this computer, but know it just start take times and them the "error message".

Thanks.

Jp