trouble with ENSEMBL ids for Org.Ss.eg.db
1
0
Entering edit mode
@carinegenet-10909
Last seen 2.1 years ago
France

Dear all,

since several months, I encountered troubles when using ENSEMBL ids for Org.Ss.eg.db. First when I load the library("org.Ss.eg.db") I obtained the following message: "org.Ss.eg contains GO mappings based on older data because the Blast2GO data resource was removed from the public domain just before the most recent update was produced. We are working on an alternative means to get this kind of data before the next release."

It seems that it implies that I can't use ENSEMBL ids at all for pig species. I tested with Org.Bt.eg.db (see below) and it works. I know that I can use biomaRt for converting ENSEMBLIDs to Symbol for example, but I also use some packages where only ENSEMBLIDs is accept or unique identifiers are needed. I also perform network analysis so using ENSEMBLIDs allow the link between GeneOntology/ pathway data and genes. Any help will be greatly appreciated. Regards Carine

rm(list=ls())
library(dplyr)
library(DESeq2)
library("org.Ss.eg.db")
load(file="respig_04juin.rda") #load DESEQ2res object pig 
> head(res)
log2 fold change (MLE): Treatment vs CTRL 
Wald test p-value: Treatment vs CTRL 
DataFrame with 6 rows and 6 columns
                    baseMean log2FoldChange     lfcSE      stat      pvalue        padj
                   <numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric>
ENSSSCG00000000002   108.186     -1.6866744 0.1772671 -9.514877 1.81927e-21 2.88712e-20
ENSSSCG00000000003   485.236      0.2635016 0.0543566  4.847643 1.24937e-06 5.43363e-06
ENSSSCG00000000005   217.099     -0.0369672 0.0734253 -0.503466 6.14636e-01 7.14060e-01
ENSSSCG00000000006   119.237     -0.2583996 0.1059635 -2.438572 1.47454e-02 3.13157e-02
ENSSSCG00000000007   218.799      0.0656453 0.0789420  0.831563 4.05655e-01 5.20681e-01
ENSSSCG00000000010  1677.887      0.6257250 0.0497819 12.569327 3.11361e-36 1.10954e-34
> library("AnnotationDbi")
> res$symbol<-mapIds(org.Ss.eg.db,keys = row.names(res),column = "SYMBOL",keytype = "ENSEMBL", multiVals = "first")
Error in testForValidKeytype(x, keytype) : 
  Invalid keytype: ENSEMBL. Please use the keytypes method to see a listing of valid arguments.

load(file="resbov_04juin.rda") #load DESEQ2res object Bovine species
> head(res)
log2 fold change (MLE): Treatment vs CTRL 
Wald test p-value: Treatment vs CTRL 
DataFrame with 6 rows and 6 columns
                    baseMean log2FoldChange     lfcSE      stat      pvalue        padj
                   <numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric>
ENSBTAG00000000005  28.85007     -0.2688708 0.1897403 -1.417047 1.56469e-01 0.310875837
ENSBTAG00000000009   2.60652      0.7732095 0.6467508  1.195529 2.31880e-01 0.412542771
ENSBTAG00000000010 599.01895      0.1479371 0.0522362  2.832081 4.62462e-03 0.017988122
ENSBTAG00000000011  13.05645      1.2050085 0.2859651  4.213830 2.51076e-05 0.000182048
ENSBTAG00000000012 198.28937     -0.0591896 0.0762287 -0.776475 4.37469e-01 0.626871047
ENSBTAG00000000013 704.51548     -0.1256780 0.0437164 -2.874847 4.04223e-03 0.016057889
> library("org.Bt.eg.db")
> res$symbol<-mapIds(org.Bt.eg.db,keys = row.names(res),column = "SYMBOL",keytype = "ENSEMBL", multiVals = "first")
'select()' returned 1:many mapping between keys and columns
> head(res)
log2 fold change (MLE): Treatment BMP15 vs CTRL 
Wald test p-value: Treatment BMP15 vs CTRL 
DataFrame with 6 rows and 7 columns
                    baseMean log2FoldChange     lfcSE      stat      pvalue        padj      symbol
                   <numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric> <character>
ENSBTAG00000000005  28.85007     -0.2688708 0.1897403 -1.417047 1.56469e-01 0.310875837        GRK3
ENSBTAG00000000009   2.60652      0.7732095 0.6467508  1.195529 2.31880e-01 0.412542771       FOXF1
ENSBTAG00000000010 599.01895      0.1479371 0.0522362  2.832081 4.62462e-03 0.017988122        UBL7
ENSBTAG00000000011  13.05645      1.2050085 0.2859651  4.213830 2.51076e-05 0.000182048         TDH
ENSBTAG00000000012 198.28937     -0.0591896 0.0762287 -0.776475 4.37469e-01 0.626871047       TTC33
ENSBTAG00000000013 704.51548     -0.1256780 0.0437164 -2.874847 4.04223e-03 0.016057889      PRKAA1
> 
sessionInfo( )
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
[5] LC_TIME=French_France.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] org.Bt.eg.db_3.13.0         org.Ss.eg.db_3.13.0         AnnotationDbi_1.54.0        GeneTonic_1.5.2            
 [5] DESeq2_1.32.0               SummarizedExperiment_1.22.0 MatrixGenerics_1.4.0        matrixStats_0.59.0         
 [9] GenomicRanges_1.44.0        GenomeInfoDb_1.28.0         IRanges_2.26.0              S4Vectors_0.30.0           
[13] dplyr_1.0.6                 Biobase_2.52.0              BiocGenerics_0.38.0        

loaded via a namespace (and not attached):
  [1] utf8_1.2.1             shinydashboard_0.7.1   tidyselect_1.1.1       heatmaply_1.2.1        RSQLite_2.2.7         
  [6] htmlwidgets_1.5.3      grid_4.1.0             TSP_1.1-10             BiocParallel_1.26.0    scatterpie_0.1.6      
 [11] munsell_0.5.0          codetools_0.2-18       DT_0.18                miniUI_0.1.1.1         withr_2.4.2           
 [16] colorspace_2.0-1       GOSemSim_2.18.0        Category_2.58.0        filelock_1.0.2         pcaExplorer_2.18.0    
 [21] knitr_1.33             rstudioapi_0.13        shinyWidgets_0.6.0     DOSE_3.18.0            NMF_0.23.0            
 [26] GenomeInfoDbData_1.2.6 polyclip_1.10-0        topGO_2.44.0           bit64_4.0.5            farver_2.1.0          
 [31] pheatmap_1.0.12        downloader_0.4         vctrs_0.3.8            treeio_1.16.1          generics_0.1.0        
 [36] xfun_0.23              BiocFileCache_2.0.0    R6_2.5.0               doParallel_1.0.16      clue_0.3-59           
 [41] graphlayouts_0.7.1     seriation_1.2-9        locfit_1.5-9.4         bitops_1.0-7           cachem_1.0.5          
 [46] shinyAce_0.4.1         fgsea_1.18.0           DelayedArray_0.18.0    assertthat_0.2.1       shinycssloaders_1.0.0 
 [51] promises_1.2.0.1       scales_1.1.1           ggraph_2.0.5           enrichplot_1.12.0      gtable_0.3.0          
 [56] Cairo_1.5-12.2         tidygraph_1.2.0        rlang_0.4.11           genefilter_1.74.0      GlobalOptions_0.1.2   
 [61] splines_4.1.0          lazyeval_0.2.2         bs4Dash_2.0.0          shinyBS_0.61           BiocManager_1.30.15   
 [66] yaml_2.2.1             reshape2_1.4.4         threejs_0.3.3          crosstalk_1.1.1        httpuv_1.6.1          
 [71] qvalue_2.24.0          clusterProfiler_4.0.0  RBGL_1.68.0            backbone_1.4.0         tools_4.1.0           
 [76] gridBase_0.4-7         ggplot2_3.3.3          ellipsis_0.3.2         jquerylib_0.1.4        RColorBrewer_1.1-2    
 [81] dynamicTreeCut_1.63-1  Rcpp_1.0.6             plyr_1.8.6             base64enc_0.1-3        visNetwork_2.0.9      
 [86] progress_1.2.2         zlibbioc_1.38.0        purrr_0.3.4            RCurl_1.98-1.3         prettyunits_1.1.1     
 [91] GetoptLong_1.0.5       viridis_0.6.1          cowplot_1.1.1          ggrepel_0.9.1          cluster_2.1.2         
 [96] magrittr_2.0.1         data.table_1.14.0      DO.db_2.9              circlize_0.4.12        SparseM_1.81          
[101] colourpicker_1.1.0     hms_1.1.0              patchwork_1.1.1        mime_0.10              evaluate_0.14         
[106] xtable_1.8-4           XML_3.99-0.6           shape_1.4.6            gridExtra_2.3          compiler_4.1.0        
[111] biomaRt_2.48.0         tibble_3.1.2           crayon_1.4.1           shadowtext_0.0.8       htmltools_0.5.1.1     
[116] GOstats_2.58.0         later_1.2.0            tidyr_1.1.3            geneplotter_1.70.0     aplot_0.0.6           
[121] expm_0.999-6           DBI_1.1.1              tweenr_1.0.2           dbplyr_2.1.1           ComplexHeatmap_2.8.0  
[126] MASS_7.3-54            rappdirs_0.3.3         Matrix_1.3-4           igraph_1.2.6           pkgconfig_2.0.3       
[131] rvcheck_0.1.8          registry_0.5-1         plotly_4.9.3           foreach_1.5.1          ggtree_3.0.2          
[136] annotate_1.70.0        bslib_0.2.5.1          rngtools_1.5           pkgmaker_0.32.2        webshot_0.5.2         
[141] XVector_0.32.0         AnnotationForge_1.34.0 stringr_1.4.0          digest_0.6.27          graph_1.70.0          
[146] Biostrings_2.60.0      rmarkdown_2.8          fastmatch_1.1-0        rintrojs_0.2.2         tidytree_0.3.4        
[151] dendextend_1.15.1      GSEABase_1.54.0        curl_4.3.1             shiny_1.6.0            rjson_0.2.20          
[156] lifecycle_1.0.0        nlme_3.1-152           jsonlite_1.7.2         viridisLite_0.4.0      limma_3.48.0          
[161] fansi_0.5.0            pillar_1.6.1           lattice_0.20-44        KEGGREST_1.32.0        fastmap_1.1.0         
[166] httr_1.4.2             survival_3.2-11        GO.db_3.13.0           glue_1.4.2             png_0.1-7             
[171] iterators_1.0.13       bit_4.0.4              Rgraphviz_2.36.0       sass_0.4.0             ggforce_0.3.3         
[176] stringi_1.6.2          blob_1.2.1             memoise_2.0.0          ape_5.5               
>
Annotation org.Ss.eg.db AnnotationDbi • 1.9k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 11 minutes ago
United States

There are two issues here. First, as you note there is a warning (actually from the distant past) that says the GO mappings are old. This is actually not true, and hasn't been for years. We did formerly use Blast2GO to get the mappings, but now use UniProt. The person who fixed the issue never rescinded the warning, so it's persisted for years now and you are the first one IIRC to point it out. It's been fixed in both release and devel, so if you update your AnnotationDbi package in a day or two, you will no longer see that warning.

The second issue is the lack of mappings between Ensembl and NCBI in this package (there are actually no direct Ensembl -> Symbol mappings, they all go through the NCBI Gene ID, which is the central ID for this package). There has never actually been an Ensembl table for pig. We only get the mappings for these 'supported' species:

dataset <- c("hsapiens_gene_ensembl", "rnorvegicus_gene_ensembl",
             "ggallus_gene_ensembl", "drerio_gene_ensembl",
             "celegans_gene_ensembl", "dmelanogaster_gene_ensembl",
             "mmusculus_gene_ensembl", "btaurus_gene_ensembl",
             "clfamiliaris_gene_ensembl", "scerevisiae_gene_ensembl",
             "mmulatta_gene_ensembl", "ptroglodytes_gene_ensembl",
             "agambiae_eg_gene")

And as you note, Bos taurus does have this mapping. We could hypothetically add this table to the org.Ss.eg.db package, but if we do it won't happen before next release. So for now I would recommend just using the biomaRt package to do the mappings.

ADD COMMENT
0
Entering edit mode

Thank you for your clear answer and for fixing the first issue (;-).

For the second issue : As I used to work with human, danio or mouse, I never realised that these are "supported" species. I will use the biomart package. Again thank you Cheers

ADD REPLY

Login before adding your answer.

Traffic: 893 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6