BioMart: curl option when using getBM in loop
2
0
Entering edit mode
Ni-Ar ▴ 10
@ni-ar-15663
Last seen 23 months ago
Spain/Barcelona

Hi Mike,

I'm getting a curl error when using the getBM() function.

ensembl@host
[1] "https://useast.ensembl.org:443/biomart/martservice?redirect=no"
ensembl@biomart
"ENSEMBL_MART_ENSEMBL"

getBM(attributes = c('ensembl_transcript_id','transcript_tsl'),
filters = 'external_gene_name',
values = my_fav_gene_of_interest,
mart = ensembl,
verbose = F,
uniqueRows = T,
quote = "\'")

Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: [useast.ensembl.org:8443] Connection timed out after 10005 milliseconds


I am using getBM in a loop and I saw that one of the options of this function is curl.

An optional 'CURLHandle' object, that can be used to speed up getBM when used in a loop.

In the reference vignette I cannot really find info on how to properly use this option. I assume I should create a new handle and set some longer timeout options like:

h_long <- new_handle()
handle_option(h_long, timeout = 99999)


But I'm not really sure if this would help or if the syntax is right. Any suggestion or help would be greatly appreciated!

I'm using curl version 4.3 and biomaRt version 2.42

Thanks a lot! Nicco

curl biomart getBM Tutorial • 2.3k views
1
Entering edit mode
swbarnes2 ▴ 970
@swbarnes2-14086
Last seen 3 hours ago
San Diego
Error in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: [useast.ensembl.org:8443] Connection timed out after 10005 milliseconds


Hold your horses. Changing the timeout speed is probably not the problem. The problem is that you can't connect to the site. It doesn't matter how long you set the timer before giving up if you can't connect, period.

Try:

ensembl = useEnsembl(biomart='ensembl', dataset=ensembl_dataset, mirror = "uswest")


See if that works. Or maybe useast works now, I've had sporadic problems connecting to that one before.

0
Entering edit mode

Thanks for the suggestion, I used useast just because I thought it is (geographically) closer to Europe (where I am now). I managed to connect again and I cannot get the error again.

0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

Also, repeatedly hitting the Biomart server using a loop is a good way to get banned. A far better idea is to ask for everything you want in one go and let biomaRt handle splitting it up if need be.

0
Entering edit mode

Wait... when you say "hitting the biomart server" are you referring to useEnsembl() or to the getBM() function?

0
Entering edit mode

Both of those functions contact the Ensembl webservers and asks them to responsed. However getBM() runs a query on specified database. There's quite an overhead on both sending & receiving data from the server, plus launching the query, so it's more efficient to submit a query with many values rather than iterating over them in a loop or apply function.

0
Entering edit mode

Okay, makes sense. Instead, in the case the data were to be cached, no query would be sent to ENSEMBL Biomart? Thanks!

0
Entering edit mode

I am saying that doing something like

gns <- <long vector of gene IDS>
for(i in seq(along = gns)) getBM(attributes = c('ensembl_transcript_id','transcript_tsl'),
filters = 'external_gene_name',
values = gns[i],
mart = ensembl,
verbose = F,
uniqueRows = T,
quote = "\'")
## is pernicious and may get you banned. As compared to
getBM(c('ensembl_transcript_id','transcript_tsl'), 'external_gene_name', gns, ensembl)
## where you ask for everything at once and biomaRt does any necessary looping

0
Entering edit mode

Thanks for clarifying