BioMart: curl option when using getBM in loop
2
0
Entering edit mode
Ni-Ar ▴ 10
@ni-ar-15663
Last seen 4.1 years ago
Spain/Barcelona

Hi Mike,

I'm getting a curl error when using the getBM() function.

ensembl@host
[1] "https://useast.ensembl.org:443/biomart/martservice?redirect=no"
ensembl@biomart
"ENSEMBL_MART_ENSEMBL"

getBM(attributes = c('ensembl_transcript_id','transcript_tsl'),
            filters = 'external_gene_name',
            values = my_fav_gene_of_interest,
            mart = ensembl,
            verbose = F,
            uniqueRows = T,
            quote = "\'")
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: [useast.ensembl.org:8443] Connection timed out after 10005 milliseconds

I am using getBM in a loop and I saw that one of the options of this function is curl.

An optional 'CURLHandle' object, that can be used to speed up getBM when used in a loop.

In the reference vignette I cannot really find info on how to properly use this option. I assume I should create a new handle and set some longer timeout options like:

h_long <- new_handle()
handle_option(h_long, timeout = 99999)

But I'm not really sure if this would help or if the syntax is right. Any suggestion or help would be greatly appreciated!

I'm using curl version 4.3 and biomaRt version 2.42

Thanks a lot! Nicco

curl biomart getBM Tutorial • 9.1k views
ADD COMMENT
1
Entering edit mode
swbarnes2 ★ 1.3k
@swbarnes2-14086
Last seen 16 hours ago
San Diego
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: [useast.ensembl.org:8443] Connection timed out after 10005 milliseconds

Hold your horses. Changing the timeout speed is probably not the problem. The problem is that you can't connect to the site. It doesn't matter how long you set the timer before giving up if you can't connect, period.

Try:

ensembl = useEnsembl(biomart='ensembl', dataset=ensembl_dataset, mirror = "uswest")

See if that works. Or maybe useast works now, I've had sporadic problems connecting to that one before.

ADD COMMENT
0
Entering edit mode

Thanks for the suggestion, I used useast just because I thought it is (geographically) closer to Europe (where I am now). I managed to connect again and I cannot get the error again.

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 14 hours ago
United States

Also, repeatedly hitting the Biomart server using a loop is a good way to get banned. A far better idea is to ask for everything you want in one go and let biomaRt handle splitting it up if need be.

ADD COMMENT
0
Entering edit mode

Wait... when you say "hitting the biomart server" are you referring to useEnsembl() or to the getBM() function?

ADD REPLY
0
Entering edit mode

Both of those functions contact the Ensembl webservers and asks them to responsed. However getBM() runs a query on specified database. There's quite an overhead on both sending & receiving data from the server, plus launching the query, so it's more efficient to submit a query with many values rather than iterating over them in a loop or apply function.

ADD REPLY
0
Entering edit mode

Okay, makes sense. Instead, in the case the data were to be cached, no query would be sent to ENSEMBL Biomart? Thanks!

ADD REPLY
0
Entering edit mode

I am saying that doing something like

gns <- <long vector of gene IDS>
for(i in seq(along = gns)) getBM(attributes = c('ensembl_transcript_id','transcript_tsl'),
                                               filters = 'external_gene_name',
                                               values = gns[i],
                                               mart = ensembl,
                                               verbose = F,
                                               uniqueRows = T,
                                               quote = "\'")
## is pernicious and may get you banned. As compared to
getBM(c('ensembl_transcript_id','transcript_tsl'), 'external_gene_name', gns, ensembl)
## where you ask for everything at once and biomaRt does any necessary looping
ADD REPLY
0
Entering edit mode

Thanks for clarifying

ADD REPLY

Login before adding your answer.

Traffic: 806 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6