biomaRt has encountered an unknown server error while using getLDS
2
0
Entering edit mode
@a4aee138
Last seen 4 months ago
United Kingdom

Hello,

I'm trying to link two databases using getLDS to obtain ensembl gene ids and gene symbols for both dataset. However I've encountered an issue. If I use mirror = "asia" the error is HTTP error code: 404 rather than the one below, and if I use mirror = "www" the error message is then Error: biomaRt has encountered an unexpected server error..

human <- useEnsembl("ensembl", dataset = "hsapiens_gene_ensembl", mirror="useast")
macaque <- useEnsembl("ensembl", dataset = "mmulatta_gene_ensembl", mirror="useast")

human_macaque <-   getLDS(attributes = c("ensembl_gene_id", "external_gene_name"), 
                                filters = "ensembl_gene_id", 
                                values = humangenes, 
                                mart = human, 
                                attributesL = c("ensembl_gene_id", "external_gene_name"),
                                martL = macaque)
Error: biomaRt has encountered an unknown server error. HTTP error code: 502
Please report this on the Bioconductor support site at https://support.bioconductor.org/
Consider trying one of the Ensembl mirrors (for more details look at ?useEnsembl)

sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices datasets  utils     methods  
[8] base     

other attached packages:
 [1] readxl_1.4.3                tximport_1.30.0            
 [3] RColorBrewer_1.1-3          biomaRt_2.58.2             
 [5] DESeq2_1.42.1               SummarizedExperiment_1.32.0
 [7] Biobase_2.62.0              MatrixGenerics_1.14.0      
 [9] matrixStats_1.0.0           GenomicRanges_1.54.1       
[11] GenomeInfoDb_1.38.8         IRanges_2.36.0             
[13] S4Vectors_0.40.2            BiocGenerics_0.48.1        
[15] lubridate_1.9.3             forcats_1.0.0              
[17] stringr_1.5.0               dplyr_1.1.3                
[19] purrr_1.0.2                 readr_2.1.4                
[21] tidyr_1.3.0                 tibble_3.2.1               
[23] ggplot2_3.4.4               tidyverse_2.0.0            
[25] reticulate_1.34.0          

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0        farver_2.1.1            blob_1.2.4             
 [4] filelock_1.0.2          Biostrings_2.70.3       bitops_1.0-7           
 [7] fastmap_1.1.1           RCurl_1.98-1.12         BiocFileCache_2.10.2   
[10] XML_3.99-0.14           digest_0.6.33           timechange_0.2.0       
[13] lifecycle_1.0.3         KEGGREST_1.42.0         RSQLite_2.3.2          
[16] magrittr_2.0.3          compiler_4.3.1          rlang_1.1.1            
[19] progress_1.2.2          tools_4.3.1             utf8_1.2.4             
[22] yaml_2.3.7              labeling_0.4.3          prettyunits_1.2.0      
[25] S4Arrays_1.2.1          curl_5.1.0              bit_4.0.5              
[28] DelayedArray_0.28.0     xml2_1.3.5              abind_1.4-5            
[31] BiocParallel_1.36.0     withr_2.5.1             grid_4.3.1             
[34] fansi_1.0.5             colorspace_2.1-0        scales_1.2.1           
[37] cli_3.6.1               crayon_1.5.2            generics_0.1.3         
[40] httr_1.4.7              tzdb_0.4.0              DBI_1.1.3              
[43] cachem_1.0.8            zlibbioc_1.48.2         parallel_4.3.1         
[46] AnnotationDbi_1.64.1    cellranger_1.1.0        BiocManager_1.30.22    
[49] XVector_0.42.0          vctrs_0.6.4             Matrix_1.6-1.1         
[52] jsonlite_1.8.7          hms_1.1.3               bit64_4.0.5            
[55] locfit_1.5-9.8          glue_1.6.2              codetools_0.2-19       
[58] stringi_1.7.12          gtable_0.3.4            munsell_0.5.0          
[61] pillar_1.9.0            rappdirs_0.3.3          GenomeInfoDbData_1.2.11
[64] dbplyr_2.4.0            R6_2.5.1                vroom_1.6.4            
[67] lattice_0.21-8          png_0.1-8               memoise_2.0.1          
[70] renv_1.0.3              Rcpp_1.0.11             SparseArray_1.2.4      
[73] pkgconfig_2.0.3

Interestingly, if I do the same but calling an old archive there's not error

useMart(biomart='ensembl', host='https://jul2019.archive.ensembl.org/') %>% useDataset(dataset='hsapiens_gene_ensembl') -> human

and if I use

useMart(biomart='ensembl', host='https://apr2022.archive.ensembl.org/') %>% useDataset(dataset='hsapiens_gene_ensembl') -> human`  the error is `Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: [apr2022.archive.ensembl.org:443] Operation timed out after 60001 milliseconds with 0 bytes received`

I would appreciate any help with this. Thank you very much, Miriam

BioMartGOGeneSets getLDS • 576 views
ADD COMMENT
1
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 6 hours ago
EMBL Heidelberg

There's a lot to unpack here!

First, unfortunatley the getLDS funciontallity started failing with the release of BioMart 106. I've asked Ensembl support several times if they can fix this, but no solution has been found. This is not a biomaRt issue, you get a similar error if you runn the query in the web interface. This is why it works fine when you use the jul2019 archive.

I don't want to remove the getLDS() function completely, because it still works fine for some queries, but it's very hard to detect what will break without running it when I don't know the root cause.

To work around this for some circumstances I've been developing some additional code to query Ensembl in a more reliable manner. In your case there's now a function called getHomologs() which I think might help. You'll need to install the developmental version of biomaRt because this is very new. You can do that with: BiocManager::install('grimbough/biomaRt').

Here's an example of how you might be able to use it:

library(biomaRt)

humangenes <- c('ENSG00000000003', 'ENSG00000000005', 'ENSG00000000419',
'ENSG00000000457', 'ENSG00000000460', 'ENSG00000000938', 'ENSG00000000971', 
'ENSG00000001036', 'ENSG00000001084', 'ENSG00000001167')

human <- useEnsembl("ensembl", dataset = "hsapiens_gene_ensembl", mirror="useast")
macaque <- useEnsembl("ensembl", dataset = "mmulatta_gene_ensembl", mirror="useast")

homologs <- getHomologs(ensembl_gene_ids = humangenes, species_from = 'human', species_to = 'macaque')

human_tab <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"), 
      filters = "ensembl_gene_id", 
      values = humangenes, 
      mart = human)

macaque_tab <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"), 
      filters = "ensembl_gene_id", 
      values = homologs$mmulatta_homolog_ensembl_gene, 
      mart = macaque)

homologs_with_gene_names <- merge(homologs, human_tab, by = 1) |> 
  merge(macaque_tab, by.x = 2, by.y = 1, all = TRUE, suffixes = c('_human', '_macaque'))

homologs_with_gene_names
#>    mmulatta_homolog_ensembl_gene ensembl_gene_id external_gene_name_human external_gene_name_macaque
#> 1                                ENSG00000000003                   TSPAN6                       <NA>
#> 2             ENSMMUG00000002759 ENSG00000000419                     DPM1                       DPM1
#> 3             ENSMMUG00000005366 ENSG00000001036                    FUCA2                      FUCA2
#> 4             ENSMMUG00000008212 ENSG00000000005                     TNMD                       TNMD
#> 5             ENSMMUG00000008684 ENSG00000001084                     GCLC                       GCLC
#> 6             ENSMMUG00000014434 ENSG00000000938                      FGR                        FGR
#> 7             ENSMMUG00000016582 ENSG00000000460                    FIRRM                      FIRRM
#> 8             ENSMMUG00000016583 ENSG00000000457                    SCYL3                      SCYL3
#> 9             ENSMMUG00000021173 ENSG00000001167                     NFYA                       NFYA
#> 10            ENSMMUG00000045497 ENSG00000000971                      CFH                        CFH

I'm not sure of the reason behind the errors for the Asia mirror or the April 2022 archive. Sometimes Ensembl can just be a bit unreliable with the volumne of traffic the site recieves. Both seem to be live for me now, although I think you'll run into the same problem with getLDS() on the Aprill 2022 archive.

ADD COMMENT
0
Entering edit mode
@a4aee138
Last seen 4 months ago
United Kingdom

This has worked very well for me. Thank you for your help!!

ADD COMMENT

Login before adding your answer.

Traffic: 683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6