Question

Mouse to Human Genes using bioconductor

1

Entering edit mode

Max ▴ 10

@ed35f28a

Last seen 8 months ago

United States

I have this function here which I am trying to use to turn mouse genes into human genes. When I run it with the line below, I get this error:

Error: biomaRt has encountered an unknown server error. HTTP error code: 502 Please report this on the Bioconductor support site at https://support.bioconductor.org/ Consider trying one of the Ensembl mirrors (for more details look at ?useEnsembl)

   convertMouseGeneList2 <- function(x){
  human = useEnsembl(biomart = "ensembl", dataset = "hsapiens_gene_ensembl", mirror = "uswest")
  mouse = useEnsembl(biomart = "ensembl", dataset = "mmusculus_gene_ensembl", mirror = "uswest")
  genesV2 = getLDS(attributes = c("mgi_symbol"), filters = "mgi_symbol", values = x , mart = mouse, attributesL = c("hgnc_symbol"), martL = human, uniqueRows=T)
  humanx <- unique(genesV2[, 2])
  print(head(humanx))
  return(humanx)
}

humgenes<-convertMouseGeneList2(musgenes)

As you can see, I am trying to use the uswest mirror. The useast and asia mirrors come up with a similar error. Fyi, the musgenes object is just a list of genes ([1] "Gm29216" "Gm15501" "Rpl9-ps6" "Gm10563" "Adam8" "Gm10801" "Glipr1" ...). Could someone please help me figure this out? It seems a little beyond what I am able to fix on my own. The error was directed me to this website, so I am hoping this is the right place for this.

Here is my sessioninfo()

R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pegas_1.2                               ape_5.7-1                               biomaRt_2.54.1                         
 [4] DESeq2_1.38.3                           R.utils_2.12.2                          R.oo_1.25.0                            
 [7] R.methodsS3_1.8.2                       ggrepel_0.9.3                           RColorBrewer_1.1-3                     
[10] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.50.4                  AnnotationDbi_1.60.2                   
[13] reshape2_1.4.4                          clusterProfiler_4.6.2                   ChIPseeker_1.34.1                      
[16] pheatmap_1.0.12                         ggplot2_3.4.2                           SummarizedExperiment_1.28.0            
[19] Biobase_2.58.0                          GenomicRanges_1.50.2                    GenomeInfoDb_1.34.9                    
[22] MatrixGenerics_1.10.0                   matrixStats_0.63.0                      IRanges_2.32.0                         
[25] S4Vectors_0.36.2                        BiocGenerics_0.44.0                     stringr_1.5.0                          
[28] data.table_1.14.8                       readxl_1.4.2                           

loaded via a namespace (and not attached):
  [1] shadowtext_0.1.2         fastmatch_1.1-3          BiocFileCache_2.6.1      plyr_1.8.8               igraph_1.4.2            
  [6] lazyeval_0.2.2           splines_4.2.3            BiocParallel_1.32.6      digest_0.6.31            yulab.utils_0.0.6       
 [11] htmltools_0.5.5          GOSemSim_2.24.0          viridis_0.6.3            GO.db_3.16.0             fansi_1.0.4             
 [16] magrittr_2.0.3           memoise_2.0.1            annotate_1.76.0          Biostrings_2.66.0        graphlayouts_1.0.0      
 [21] enrichplot_1.13.1.992    prettyunits_1.1.1        colorspace_2.1-0         blob_1.2.4               rappdirs_0.3.3          
 [26] xfun_0.39                dplyr_1.1.2              crayon_1.5.2             RCurl_1.98-1.12          jsonlite_1.8.4          
 [31] scatterpie_0.1.9         glue_1.6.2               polyclip_1.10-4          gtable_0.3.3             zlibbioc_1.44.0         
 [36] XVector_0.38.0           DelayedArray_0.24.0      scales_1.2.1             DOSE_3.24.2              DBI_1.1.3               
 [41] Rcpp_1.0.10              plotrix_3.8-2            xtable_1.8-4             viridisLite_0.4.2        progress_1.2.2          
 [46] gridGraphics_0.5-1       tidytree_0.4.2           bit_4.0.5                httr_1.4.6               fgsea_1.24.0            
 [51] gplots_3.1.3             pkgconfig_2.0.3          XML_3.99-0.14            farver_2.1.1             dbplyr_2.3.2            
 [56] locfit_1.5-9.7           utf8_1.2.3               labeling_0.4.2           ggplotify_0.1.0          tidyselect_1.2.0        
 [61] rlang_1.1.1              munsell_0.5.0            cellranger_1.1.0         tools_4.2.3              cachem_1.0.8            
 [66] downloader_0.4           cli_3.6.1                generics_0.1.3           RSQLite_2.3.1            gson_0.1.0              
 [71] evaluate_0.21            fastmap_1.1.1            yaml_2.3.7               ggtree_3.6.2             knitr_1.42              
 [76] bit64_4.0.5              tidygraph_1.2.3          caTools_1.18.2           purrr_1.0.1              KEGGREST_1.38.0         
 [81] ggraph_2.1.0             nlme_3.1-162             aplot_0.1.10             xml2_1.3.4               compiler_4.2.3          
 [86] rstudioapi_0.14          filelock_1.0.2           curl_5.0.0               png_0.1-8                treeio_1.25.2           
 [91] geneplotter_1.76.0       tibble_3.2.1             tweenr_2.0.2             stringi_1.7.12           lattice_0.20-45         
 [96] Matrix_1.5-3             vctrs_0.6.2              pillar_1.9.0             lifecycle_1.0.3          cowplot_1.1.1           
[101] bitops_1.0-7             patchwork_1.1.2          rtracklayer_1.58.0       qvalue_2.30.0            R6_2.5.1                
[106] BiocIO_1.8.0             renv_0.17.3              KernSmooth_2.23-20       gridExtra_2.3            codetools_0.2-19        
[111] boot_1.3-28.1            MASS_7.3-58.2            gtools_3.9.4             rjson_0.2.21             withr_2.5.0             
[116] GenomicAlignments_1.34.1 Rsamtools_2.14.0         GenomeInfoDbData_1.2.9   parallel_4.2.3           hms_1.1.3               
[121] grid_4.2.3               ggfun_0.0.9              tidyr_1.3.0              HDO.db_0.99.1            rmarkdown_2.21          
[126] ggforce_0.4.1            restfulr_0.0.15

ensembldb • 1.3k views

ADD COMMENT • link updated 8 months ago by James W. MacDonald 65k • written 8 months ago by Max ▴ 10

score 2 · Answer 1 · 2023-08-02

Mike Smith will probably be along in a while to explain what's happening with biomaRt. In the interim, you can always use the Orthology.eg.db package to do the mapping.

library(org.Hs.eg.db)
library(org.Mm.eg.db)
library(Orthology.eg.db)

mapIt <- function(mouseids, horg, morg, orth){
    mouseg <- mapIds(morg, mouseids, "ENTREZID", "SYMBOL")
    mapped <- select(orth, mouseg, "Homo_sapiens","Mus_musculus")
    names(mapped) <- c("Mus_egid","Homo_egid")
    husymb <- select(horg, as.character(mapped[,2]), "SYMBOL","ENTREZID")
    return(data.frame(Mus_symbol = mouseids,
                      mapped,
                      Homo_symbol = husymb[,2]))
}

## dumb example
> mapIt(head(keys(org.Mm.eg.db, "SYMBOL"), 20), org.Hs.eg.db, org.Mm.eg.db, Orthology.eg.db)
'select()' returned 1:1 mapping between keys and columns
'select()' returned many:1 mapping between keys and columns
         Mus_symbol Mus_egid Homo_egid Homo_symbol
Pzp             Pzp    11287        NA        <NA>
Aanat         Aanat    11298        15       AANAT
Aatk           Aatk    11302      9625        AATK
Abca1         Abca1    11303        19       ABCA1
Abca4         Abca4    11304        24       ABCA4
Abca2         Abca2    11305        20       ABCA2
Abcb7         Abcb7    11306        22       ABCB7
Abcg1         Abcg1    11307      9619       ABCG1
Abi1           Abi1    11308     10006        ABI1
Abl1           Abl1    11350        25        ABL1
Abl2           Abl2    11352        27        ABL2
Scgb1b27   Scgb1b27    11354        NA        <NA>
ac               ac    11358        NA        <NA>
Acadl         Acadl    11363        33       ACADL
Acadm         Acadm    11364        34       ACADM
Acadvl       Acadvl    11370        37      ACADVL
Acads         Acads    11409        35       ACADS
Slc33a1     Slc33a1    11416      9197     SLC33A1
Asic2         Asic2    11418        40       ASIC2
Asic1         Asic1    11419        41       ASIC1

Do note however that this is really slow at present, due to me being a big ol dummy and not adding an index to the underlying database. I am using an indexed db at present, and it's fast.

> system.time(mapIt(head(keys(org.Mm.eg.db, "SYMBOL"), 20), org.Hs.eg.db, org.Mm.eg.db, Orthology.eg.db))
'select()' returned 1:1 mapping between keys and columns
'select()' returned many:1 mapping between keys and columns
   user  system elapsed 
   0.20    0.03    0.23

I'll update the code that makes the db and in the next release it'll be faster.