Search
Question: biomaRt: list of hosts accessed
0
11 months ago by
ashok.ragavendran0 wrote:

Hello all,

I am trying to use the biomaRt package from within a secure computing environment. In this context we will have to configure the firewalls only to specific IPs to maintain compliance with security procedures. From my understanding of how the biomaRt package works a request is sent to

www.ensembl.org/biomart/martservice

which returns an xml results containing the various datasets and the hosts they are located on. So from what I understand, the hostname is not hard-coded into the package other than the path to the mart service. Am in right in this presumption? or do I misunderstand the code in the package.

I originally tried contact Ensembl help desk about this, as from my previous experience they have been very responsive and knowledgeable. However,  this time the support person kept insisting that I find out what servers are being accessed by biomaRt.
I will be much obliged for any help in this regard and any insights from the community

Cheers

Ashok

modified 11 months ago by Mike Smith2.7k • written 11 months ago by ashok.ragavendran0
0
11 months ago by
Mike Smith2.7k
EMBL Heidelberg / de.NBI
Mike Smith2.7k wrote:

You're correct that the host name is not hard coded into the package (at least it shouldn't be).  biomaRt (the R package) is designed as a tool to access any server that is running BioMart (the service) to provide access to its data.  The host argument is used to specify this in most functions.  Ensembl is by far the most widely used BioMart instance, and so www.ensembl.org is typically used as a default, but you specify the URL you require.

However, you're also correct that the value given to the host argument isn't necessarily the final address of the data.  When selecting a mart an initial query is made the the XML registry of marts held on the specified server.  Each entry then has a subsequent host="" value which is really where the data are accessed.  You can view the Ensembl registry entry here:

http://www.ensembl.org/biomart/martservice?type=registry

If you need to view a different BioMart server you can typically use the same URL format as this, just substitute the www.ensembl.org part for the server you're using.  However in my experience it's very rare to find a host="" value in the registry that isn't the same as the initial host (I've seen it only once, and in that instance it was broken anyway).

One caveat if you're trying to access Ensembl, is that your requests may be redirected to the nearest mirror geographically.  Thus you can end up accessing useast.ensembl.org, uswest.ensembl.org or asia.ensembl.org even if you explicitly set host = "www.ensembl.org".  To avoid this you can use the argument ensemblRedirect = FALSE.