Ensembl archive challenges
1
0
Entering edit mode
yunadal • 0
@ec4dd50e
Last seen 2.6 years ago
Australia

Hello all,

I am having a lot of trouble accessing biomaRt/ Ensembl at the moment

I have GRCh38 chr:pos SNPs that I would like the db151 rsIDs for, and to this end have been trying poll Version 96 of Ensembl.

This has worked once earlier today for 10 SNPs: 7:128935744:128935744, 3:119518880:119518880, 12:132463597:132463597, 4:40305571:40305571, 17:45379362:45379362, 6:32143247:32143247, 2:191084261:191084261, 6:32286483:32286483, 6:31331721:31331721, 6:32714358:32714358,

Subsequently, I have been unable to alter the attributes I seek, as it returns

Error: biomaRt has encountered an unexpected server error.
Consider trying one of the Ensembl mirrors (for more details look at ?useEnsembl)

I cannot use a different mirror, as it is an archived version of Ensembl.

In an attempt to debug I have moved to the most modern version of Ensembl, and am now getting

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: [asia.ensembl.org:443] Operation timed out after 300004 milliseconds with 0 bytes received

despite trying different mirrors (www, asia, useast)

The current pipeline for GRCh38 SNPs returning dbSNP154 rsIDs is:

library(gwasrapidd)
library(biomaRt)

SLE <- get_variants(efo_id = "EFO_0002690")

SLEsnps <- c(paste(SLE@variants[1,4][[1]], SLE@variants[1,5][[1]], SLE@variants[1,5][[1]], sep = ":"))
for (i in 2:10){
  SLEsnps <- append(SLEsnps, paste(SLE@variants[i,4][[1]], SLE@variants[i,5][[1]], SLE@variants[i,5][[1]], sep = ":"))
}
ensembl <- useEnsembl(biomart = 'snps', dataset = 'hsapiens_snp', mirror = "www")
getBM(attributes = c("refsnp_id"), #stable code
       filters = c("chromosomal_region"),
       values = list(SLEsnps), 
       mart = ensembl)

Obviously for dbSNP151 rsIDs I would use ensembl <- useEnsembl(biomart = 'snps', dataset = 'hsapiens_snp', version = 96)

What on earth am I doing wrong to have recurrent time-outs and internal server errors?

sessionInfo( )

R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] BSgenome_1.58.0      rtracklayer_1.50.0   Biostrings_2.58.0   
 [4] XVector_0.30.0       GenomicRanges_1.42.0 GenomeInfoDb_1.26.7 
 [7] IRanges_2.24.1       S4Vectors_0.28.1     BiocGenerics_0.36.1 
[10] Matrix_1.3-4         biomaRt_2.46.3       gwasrapidd_0.99.11  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7                  lattice_0.20-44            
 [3] prettyunits_1.1.1           Rsamtools_2.6.0            
 [5] assertthat_0.2.1            utf8_1.2.2                 
 [7] BiocFileCache_1.14.0        R6_2.5.1                   
 [9] RSQLite_2.2.8               httr_1.4.2                 
[11] pillar_1.6.2                zlibbioc_1.36.0            
[13] rlang_0.4.11                progress_1.2.2             
[15] curl_4.3.2                  rstudioapi_0.13            
[17] blob_1.2.2                  BiocParallel_1.24.1        
[19] stringr_1.4.0               RCurl_1.98-1.4             
[21] bit_4.0.4                   tinytex_0.33               
[23] DelayedArray_0.16.3         compiler_4.0.4             
[25] xfun_0.25                   pkgconfig_2.0.3            
[27] askpass_1.1                 SummarizedExperiment_1.20.0
[29] openssl_1.4.5               tidyselect_1.1.1           
[31] tibble_3.1.4                GenomeInfoDbData_1.2.4     
[33] matrixStats_0.60.1          XML_3.99-0.7               
[35] fansi_0.5.0                 withr_2.4.2                
[37] crayon_1.4.1                dplyr_1.0.7                
[39] dbplyr_2.1.1                GenomicAlignments_1.26.0   
[41] bitops_1.0-7                rappdirs_0.3.3             
[43] grid_4.0.4                  lifecycle_1.0.0            
[45] DBI_1.1.1                   magrittr_2.0.1             
[47] cli_3.0.1                   stringi_1.7.4              
[49] cachem_1.0.6                xml2_1.3.2                 
[51] ellipsis_0.3.2              generics_0.1.0             
[53] vctrs_0.3.8                 tools_4.0.4                
[55] bit64_4.0.5                 Biobase_2.50.0             
[57] glue_1.4.2                  purrr_0.3.4                
[59] MatrixGenerics_1.2.1        hms_1.1.0                  
[61] fastmap_1.1.0               AnnotationDbi_1.52.0       
[63] BiocManager_1.30.16         memoise_2.0.0
biomaRt • 1.2k views
ADD COMMENT
0
Entering edit mode

Hey team,

Brief update --

I think the errors I am having are related to Ensembl (?) server load and time-outs

By limiting the request to 5 SNPs I get reliable responses on Ensembl 104 (most recent build), with no time-outs

Unfortunately, using archived versions is still a bit flaky. e.g. useEnsembl(biomart = 'snps', dataset = 'hsapiens_snp', version = 96) throws an internal server error after trying several servers, but useEnsembl(biomart = 'snps', dataset = 'hsapiens_snp', version = 95) works fine for the useEnsemble() portion, but then throws Error: biomaRt has encountered an unexpected server error. when a getBM() query is submitted for 5 SNPs. It works fine for a single SNP.

All a little odd -- is biomaRt usually this limited in its throughput?

ADD REPLY
0
Entering edit mode
Mike Smith ★ 6.5k
@mike-smith
Last seen 15 hours ago
EMBL Heidelberg

Your query looks fine, and I'm afraid I don't think there's much you can actually do to make this work faster when you're querying the most recent Ensembl build. Ensembl BioMart is a complicated tool, and it's not easy to predict performance. It looks to me like this particular combination of filters, attributes and the human SNP dataset is very slow to run a query. You see the same slowness if trying to do the query in a web browser rather than via biomaRt. I wish I could give you an explaination of why it is so slow, but BioMart is pretty opaque regarding the operations it's carrying out in the background. Normally I'd advise that running a single query with multiple values is more efficient than lots of small queries. However, if you hit BioMart's 5 minute time limit you get nothing back, and it seems like that happens for even a very small number of query values. As you've figured out, it's probably most reliable to run individual queries, but I expect it will still be painfully slow.

Regarding the "Internal Error 500" when trying version 96, I get the same problem when visiting that archive page (http://apr2019.archive.ensembl.org/index.html) in a browser. It seems like that entire Ensembl archive is offline at the moment, so you won't be able to connect to the relevant BioMart regardless of the query you want to run.

ADD COMMENT

Login before adding your answer.

Traffic: 958 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6