biomaRt getLDS error
2
1
Entering edit mode
mark.dunning ▴ 20
@markdunning-9383
Last seen 14 months ago
Sheffield, Uk

Hi all,

I'm trying to map gene IDs between species and hoping to use the getLDS function from biomaRt as I have done previously. However, the example in the biomaRt vignette doesn't seem to work, and I get the same error on my code.

Does anyone have suggestions on how to fix? Many thanks

human <- useEnsembl("ensembl", dataset = "hsapiens_gene_ensembl")
mouse <- useEnsembl("ensembl", dataset = "mmusculus_gene_ensembl")
getLDS(attributes = c("hgnc_symbol","chromosome_name", "start_position"),
       filters = "hgnc_symbol", values = "TP53",
       mart = human,
       attributesL = c("refseq_mrna","chromosome_name","start_position"), 
       martL = mouse)
## Error in `httr2::req_perform()`:
## ! HTTP 500 Internal Server Error.
> sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-apple-darwin20
Running under: macOS Sonoma 14.1.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biomaRt_2.60.1

loaded via a namespace (and not attached):
 [1] rappdirs_0.3.3          utf8_1.2.4              generics_0.1.3         
 [4] xml2_1.3.6              RSQLite_2.3.7           stringi_1.8.4          
 [7] hms_1.1.3               digest_0.6.36           magrittr_2.0.3         
[10] evaluate_0.24.0         fastmap_1.2.0           blob_1.2.4             
[13] jsonlite_1.8.8          progress_1.2.3          AnnotationDbi_1.66.0   
[16] GenomeInfoDb_1.40.1     DBI_1.2.3               httr_1.4.7             
[19] purrr_1.0.2             fansi_1.0.6             UCSC.utils_1.0.0       
[22] Biostrings_2.72.1       httr2_1.0.1             cli_3.6.3              
[25] rlang_1.1.4             crayon_1.5.3            dbplyr_2.5.0           
[28] XVector_0.44.0          Biobase_2.64.0          bit64_4.0.5            
[31] withr_3.0.0             cachem_1.1.0            yaml_2.3.9             
[34] tools_4.4.1             memoise_2.0.1           dplyr_1.1.4            
[37] filelock_1.0.3          GenomeInfoDbData_1.2.12 BiocGenerics_0.50.0    
[40] curl_5.2.1              vctrs_0.6.5             R6_2.5.1               
[43] png_0.1-8               stats4_4.4.1            lifecycle_1.0.4        
[46] BiocFileCache_2.12.0    zlibbioc_1.50.0         KEGGREST_1.44.1        
[49] stringr_1.5.1           S4Vectors_0.42.1        IRanges_2.38.1         
[52] bit_4.0.5               pkgconfig_2.0.3         pillar_1.9.0           
[55] glue_1.7.0              tidyselect_1.2.1        tibble_3.2.1           
[58] xfun_0.45               rstudioapi_0.16.0       knitr_1.48             
[61] htmltools_0.5.8.1       rmarkdown_2.27          compiler_4.4.1         
[64] prettyunits_1.2.0
biomaRt • 3.6k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

I don't have a fix for biomaRt, but there is a workaround using NCBI orthology mappings instead.

mapIt <- function(x) {
    require("Orthology.eg.db", character.only = TRUE)
    require("org.Mm.eg.db", character.only = TRUE)
    require("org.Hs.eg.db", character.only = TRUE)
    human <- mapIds(org.Hs.eg.db, x, "ENTREZID", "ALIAS")
    mapped <- select(Orthology.eg.db, human, "Mus.musculus","Homo.sapiens")
    mouse <- mapIds(org.Mm.eg.db, as.character(mapped[,2]), "SYMBOL","ENTREZID")
    mouse <- do.call(c, lapply(mouse, function(x) if(is.null(x)) return(NA) else return(x)))
    cbind(x, mapped, mouse)
}

> mapIt("TP53")
'select()' returned 1:1 mapping between keys and columns
'select()' returned 1:1 mapping between keys and columns
        x Homo.sapiens Mus.musculus mouse
TP53 TP53         7157        22059 Trp53

> z <- keys(org.Hs.eg.db, "SYMBOL")[sample(1:5000, 40)]
> mapIt(z)
'select()' returned 1:many mapping between keys and columns
'select()' returned 1:1 mapping between keys and columns
              x Homo.sapiens Mus.musculus   mouse
PAM         PAM         5066        18484     Pam
EPS15     EPS15         2060        13858   Eps15
ENDOG     ENDOG         2021        13804   Endog
GK4P       GK4P         2716           NA    <NA>
GTF2F2   GTF2F2         2963        68705  Gtf2f2
NDUFB2   NDUFB2         4708        68198  Ndufb2
LCN1       LCN1         3933           NA    <NA>
CD6         CD6          923        12511     Cd6
CLPS       CLPS         1208       109791    Clps
HSPG2     HSPG2         3339        15530   Hspg2
LSP1       LSP1         4046        16985    Lsp1
IL1B       IL1B         3553        16176    Il1b
CCND3     CCND3          896        12445   Ccnd3
CBR3       CBR3          874       109857    Cbr3
PKIA       PKIA         5569        18767    Pkia
NECTIN1 NECTIN1         5818        58235 Nectin1
CEACAM8 CEACAM8         1088           NA    <NA>
ETF1       ETF1         2107       225363    Etf1
APOH       APOH          350        11818    Apoh
CYLC1     CYLC1         1538        67407   Cylc1
KCNN3     KCNN3         3782       140493   Kcnn3
FTH1P10 FTH1P10         2502           NA    <NA>
BMPR2     BMPR2          659        12168   Bmpr2
GSN         GSN         2934       227753     Gsn
MS4A2     MS4A2         2206        14126   Ms4a2
PGM1       PGM1         5236        72157    Pgm1
PHKG1P3 PHKG1P3         5262           NA    <NA>
C8G         C8G          733        69379     C8g
ARF5       ARF5          381        11844    Arf5
RBMS1P1 RBMS1P1         5938           NA    <NA>
ATP1A4   ATP1A4          480        27222  Atp1a4
IMPA1     IMPA1         3612        55980   Impa1
ENPP3     ENPP3         5169       209558   Enpp3
PTPRN     PTPRN         5798        19275   Ptprn
DOCK3     DOCK3         1795       208869   Dock3
COL6A1   COL6A1         1291        12833  Col6a1
FANCA     FANCA         2175        14087   Fanca
RBP2       RBP2         5927       214899   Kdm5a
CSRP1     CSRP1         1465        13007   Csrp1
PTPRO     PTPRO         5800        19277   Ptpro
ADD COMMENT
0
Entering edit mode

Hit the go button too early - the post is updated so you might go to the support site for the actual post...

ADD REPLY
0
Entering edit mode

Many thanks. I will check this out. Unfortunately, the organisms I want to map to are less well-annotated and don't have an org.XX.XX.db package, so I'm not sure how well this will work.

ADD REPLY
0
Entering edit mode

It's actually based on NCBI IDs, not the symbols, and NCBI has many different species so it might be fine. I only use the OrgDbs to map from NCBI ID to SYMBOL (using HGNC symbols for non-model organisms will be problematic anyway).

But if you have only Ensembl IDs, then it could be quite problematic.

ADD REPLY
0
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 4 weeks ago
EMBL Heidelberg

Yes, unfortunately getLDS() for some queries is pretty broken server side. I've kept the function around in the hope that Ensembl would fix it, but that doesn't seem like it's going to happen.

What specifically are you trying to do i.e. which organisms and attributes do you need? The devel version of biomaRt has a function getHomologs() which tries to provided a simplified interface to the biomaRt homologs tables to match based on Ensembl IDs e.g.

library(biomaRt)
getHomologs('ENSG00000141510', 
            species_from = "human", 
            species_to = "mouse")
#> Your search term was ambigous and multiple strains matching 'mouse' were found.
#> Selecting the reference genome for this organism.
#> Use a more specific search term if this is inappropriate.
#>   ensembl_gene_id mmusculus_homolog_ensembl_gene
#> 1 ENSG00000141510             ENSMUSG00000059552

If you're using non-Ensembl IDs you'll need to do something more complicated like a regular call to getBM() for one organism to convert to Ensembl IDs and then a second biomaRt call to get any additional values you want for the second organism.

ADD COMMENT
0
Entering edit mode

Sounds great, but not working for me for some reason. I'm trying to match Human to Dog and Human to Gorilla using Ensembl IDs.

I just started up a new Bioconductor docker image and ran your code.

docker run \                                      
        -e PASSWORD=bioc \
        -p 8787:8787 \
        bioconductor/bioconductor_docker:devel
library(biomaRt)
getHomologs('ENSG00000141510', 
            species_from = "human", 
            species_to = "mouse")
Error in getBM(attributes = c("ensembl_gene_id", homolog_attribute), mart = mart,  : 
  Invalid attribute(s): _homolog_ensembl_gene 
Please use the function 'listAttributes' to get valid attribute names
> sessionInfo()
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biomaRt_2.61.2

loaded via a namespace (and not attached):
 [1] KEGGREST_1.45.1         xfun_0.46               httr2_1.0.2             Biobase_2.65.0          vctrs_0.6.5            
 [6] tools_4.4.1             generics_0.1.3          stats4_4.4.1            curl_5.2.1              tibble_3.2.1           
[11] fansi_1.0.6             AnnotationDbi_1.67.0    RSQLite_2.3.7           blob_1.2.4              pkgconfig_2.0.3        
[16] dbplyr_2.5.0            S4Vectors_0.43.2        lifecycle_1.0.4         GenomeInfoDbData_1.2.12 compiler_4.4.1         
[21] stringr_1.5.1           Biostrings_2.73.1       progress_1.2.3          GenomeInfoDb_1.41.1     htmltools_0.5.8.1      
[26] yaml_2.3.10             pillar_1.9.0            crayon_1.5.3            cachem_1.1.0            tidyselect_1.2.1       
[31] digest_0.6.36           stringi_1.8.4           dplyr_1.1.4             purrr_1.0.2             fastmap_1.2.0          
[36] cli_3.6.3               magrittr_2.0.3          utf8_1.2.4              withr_3.0.1             prettyunits_1.2.0      
[41] filelock_1.0.3          UCSC.utils_1.1.0        rappdirs_0.3.3          bit64_4.0.5             rmarkdown_2.27         
[46] XVector_0.45.0          httr_1.4.7              bit_4.0.5               png_0.1-8               hms_1.1.3              
[51] memoise_2.0.1           evaluate_0.24.0         knitr_1.48              IRanges_2.39.2          BiocFileCache_2.13.0   
[56] rlang_1.1.4             glue_1.7.0              DBI_1.2.3               BiocManager_1.30.23     xml2_1.3.6             
[61] BiocGenerics_0.51.0     pkgload_1.4.0           rstudioapi_0.16.0       jsonlite_1.8.8          R6_2.5.1               
[66] zlibbioc_1.51.1
ADD REPLY
0
Entering edit mode

There was a bug causing it to fail with the search term 'mouse' which is now patched but hasn't propagated yet.

Give it a try with "gorilla" or "dog" and see if that works better. The bug only affected a subset of species where there were multiple strains in Emsembl.

ADD REPLY
1
Entering edit mode

Amazing. Thanks so much for that!

ADD REPLY

Login before adding your answer.

Traffic: 810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6