Could you advise me how to run "getBM" in a loop when the connection is too long and ask to rerun until I have a results ?
I see that there is a option called curl, but I never used. So I don't know how to use it.
library(biomaRt)
#connexion to ENSEMBL
mart <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl")
#Extract the list of hgcn_symbol and gene_biotype for each ensembl_gene_id in my list
hgnc <- getBM(attributes=c('ensembl_gene_id','hgnc_symbol','gene_biotype'),
filters = 'ensembl_gene_id', values = mat$ensembl_gene_id, mart = mart)
Batch submitting query [=====================================================--------] 88% eta: 11sError in curl::curl_fetch_memory(url, handle = handle) :
Timeout was reached: Connection timed out after 10003 milliseconds
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.34.1 edgeR_3.20.3 limma_3.34.5
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 AnnotationDbi_1.40.0 magrittr_1.5
[4] BiocGenerics_0.24.0 progress_1.1.2 IRanges_2.12.0
[7] bit_1.1-12 lattice_0.20-35 R6_2.2.2
[10] rlang_0.1.6 httr_1.3.1 stringr_1.2.0
[13] blob_1.1.0 tools_3.4.3 parallel_3.4.3
[16] grid_3.4.3 Biobase_2.38.0 DBI_0.7
[19] assertthat_0.2.0 bit64_0.9-7 digest_0.6.13
[22] tibble_1.4.1 S4Vectors_0.16.0 bitops_1.0-6
[25] RCurl_1.95-4.9 memoise_1.1.0 RSQLite_2.0
[28] stringi_1.1.6 compiler_3.4.3 pillar_1.0.1
[31] prettyunits_1.0.2 stats4_3.4.3 XML_3.98-1.9
[34] locfit_1.5-9.1
I can't reproduce this error at the moment, but I suspect the cause was introduced a few weeks ago when I made some modification necessitated by changes to Ensembl. I'll take a closer look and try to work around it.
I have never seen the message about batch submitting query, but you shouldn't need to do a loop anyway. I can get over 61,000 IDs mapped in maybe 20 seconds in one go.
I made the request of more than 20,000 genes per tissue and run this request for several tissues. sorry this remark was not very clear.
I don't make a loop for each genes but by tissue. Some of these request fail. I asked to rerun and for the tissue failling and for the following tissues, manually, but I would like to have this automatically.
The batch query message is something I've added in the latest release of biomaRt. Ensembl BioMart can have some issues if the list of filter values is large, biomaRt now internally chunks them into batches of 500 and submits them sequentially. This can take a little longer, so it prints the progress bar so you know something's happening.
I've modified the default timeout to 30 seconds instead of 10. Without being able to run the code examples on your exact setup it's hard to know if this will be sufficient, but fingers crossed. Please report back here if it still fails, we may need to do a more extensive diagnosis.
This update is available in biomaRt version 2.35.5. It'll appear in the devel branch shortly, but you can get the updated version immediately using:
I can't reproduce this error at the moment, but I suspect the cause was introduced a few weeks ago when I made some modification necessitated by changes to Ensembl. I'll take a closer look and try to work around it.