Question

How to map geneIds to ENTREZIDs

0

Entering edit mode

stacy.genovese • 0

@fce3b503

Last seen 3.0 years ago

United States

I know there are other questions on this and I've been trying to adapt the answers to my situation but am having problems. I have the following GRanges object stored in a variable, E2F4_xl:

GRanges object with 490 ranges and 15 metadata columns:
        seqnames              ranges strand |             name     score signalValue    pValue    qValue peakSummit             annotation
           <Rle>           <IRanges>  <Rle> |      <character> <numeric>   <numeric> <numeric> <numeric>  <integer>            <character>
    [1]    chr1L     2928590-2929203      * |   E2F4_xl_peak_1       459    17.41493   51.6746  45.95458        352      Distal Intergenic
    [2]    chr1L     7693350-7693892      * |   E2F4_xl_peak_2       198    13.07112   24.9757  19.88318        276 Intron (XM_018244320..
    [3]    chr1L     9637849-9638155      * |   E2F4_xl_peak_3       102     9.12290   14.9644  10.24909        253      Distal Intergenic
    [4]    chr1L   21649276-21649760      * |   E2F4_xl_peak_4        69     7.07035   11.4591   6.94732        283      Distal Intergenic
    [5]    chr1L   32643798-32644338      * |   E2F4_xl_peak_5       127    10.22822   17.5269  12.70323        257      Distal Intergenic
    ...      ...                 ...    ... .              ...       ...         ...       ...       ...        ...                    ...
  [486] chr9_10S   64286399-64286992      * | E2F4_xl_peak_486        51     6.51636   9.49010   5.11513        422      Distal Intergenic
  [487] chr9_10S   65295904-65296737      * | E2F4_xl_peak_487       604    16.52431  66.55741  60.46862        217      Distal Intergenic
  [488] chr9_10S   81345129-81345425      * | E2F4_xl_peak_488        35     5.52355   7.81482   3.59655         92      Distal Intergenic
  [489] chr9_10S 104113648-104113944      * | E2F4_xl_peak_489        35     5.52355   7.81482   3.59655         78 Intron (XM_018237823..
  [490] chr9_10S 104550953-104551249      * | E2F4_xl_peak_490        35     5.52355   7.81482   3.59655        274      Distal Intergenic
          geneChr geneStart   geneEnd geneLength geneStrand      geneId   transcriptId distanceToTSS
        <integer> <integer> <integer>  <integer>  <integer> <character>    <character>     <numeric>
    [1]         1   2895509   2909444      13936          2   108697130 XM_018226987.1        -19146
    [2]         1   7677015   7709707      32693          1      431959 NM_001091439.1         16335
    [3]         1   9687102   9724600      37499          1   108719539 XM_018268459.1        -48947
    [4]         1  21687868  21759500      71633          1   108719589 XM_018268528.1        -38108
    [5]         1  32681053  32712511      31459          1      373579 XM_018253679.1        -36715
    ...       ...       ...       ...        ...        ...         ...            ...           ...
  [486]        19  64269336  64272519       3184          2      444022 NM_001092127.1        -13880
  [487]        19  65316252  65353996      37745          1   108703023 XM_018238906.1        -19515
  [488]        19  80964756  81174142     209387          2   108703109 XM_018239095.1       -170987
  [489]        19 104110304 104140692      30389          1   108702352 XM_018237823.1          3344
  [490]        19 104374664 104384049       9386          2   108702356 XM_018237827.1       -166904
  -------

and I need to get the ENTREZID and/or geneName for each geneId. The above are from Xenopus Laevis.

Right now I've tried:

E2F4_xl_symbols <- AnnotationDbi::select(xlaevis.db,  keys = E2F4_xl$geneId,  columns = c("ENTREZID", "GENENAME"), keytype = "GENENAME")

but this doesn't make sense because I don't have a geneName column. I just have a geneId columns. But when I do columns(xlaevis.db), none of the options are anything that I have. So I tried:

ncids <- mapIds(org.Xl.eg.db, E2F4_xl$geneIds, "ENTREZID", "geneId")

But this also doesn't make sense for the same reason.

I've also tried to create a TxDb file and use that to annotate the peaks (thanks to James MacDonald for the wonderful help!):

TxDb.xlaevis.USCS <- makeTxDbPackageFromUCSC(version="0.01", 
                                             maintainer = "my email",
                                             author = "my email",
                                             genome="xenLae2",
                                             tablename="ncbiRefSeq")

when I run this I get an error that says 'xenLae2' is not a registered USCS genome but from what I can tell it is.

Can anyone help me out? I feel like I'm 90% of the way there and just need a push over the finish line. There must be a way to take the geneId column that I have and get the corresponding geneName/ENTREZID for xenopus laevis. Once I have this I need to convert the genes to human genes but I'm taking this one step at a time.

sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6.2

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] data.table_1.14.6                        RMariaDB_1.2.2                           bedr_1.0.7                              
 [4] clusterProfiler_4.6.0                    ChIPseeker_1.34.1                        fastqcr_0.1.2                           
 [7] rtracklayer_1.58.0                       org.Hs.eg.db_3.16.0                      EnsDb.Hsapiens.v86_2.99.0               
[10] ensembldb_2.22.0                         AnnotationFilter_1.22.0                  xlaevis.db_3.2.3                        
[13] org.Xl.eg.db_3.16.0                      TxDb.Hsapiens.UCSC.hg38.knownGene_3.16.0 GenomicFeatures_1.50.3                  
[16] AnnotationDbi_1.60.0                     Rsamtools_2.14.0                         Biostrings_2.66.0                       
[19] XVector_0.38.0                           ChIPQC_1.34.1                            BiocParallel_1.32.5                     
[22] DiffBind_3.8.4                           SummarizedExperiment_1.28.0              Biobase_2.58.0                          
[25] MatrixGenerics_1.10.0                    matrixStats_0.63.0                       GenomicRanges_1.50.2                    
[28] GenomeInfoDb_1.34.6                      IRanges_2.32.0                           S4Vectors_0.36.1                        
[31] BiocGenerics_0.44.0                      ggplot2_3.4.0                            BiocManager_1.30.19                     

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                                R.utils_2.12.2                            tidyselect_1.2.0                         
  [4] RSQLite_2.2.20                            htmlwidgets_1.6.1                         grid_4.2.2                               
  [7] scatterpie_0.1.8                          munsell_0.5.0                             codetools_0.2-18                         
 [10] interp_1.1-3                              systemPipeR_2.4.0                         withr_2.5.0                              
 [13] colorspace_2.0-3                          GOSemSim_2.24.0                           filelock_1.0.2                           
 [16] rstudioapi_0.14                           rJava_1.0-6                               DOSE_3.24.2                              
 [19] bbmle_1.0.25                              GenomeInfoDbData_1.2.9                    mixsqp_0.3-48                            
 [22] hwriter_1.3.2.1                           polyclip_1.10-4                           bit64_4.0.5                              
 [25] farver_2.1.1                              downloader_0.4                            treeio_1.22.0                            
 [28] coda_0.19-4                               vctrs_0.5.1                               TxDb.Rnorvegicus.UCSC.rn4.ensGene_3.2.2  
 [31] generics_0.1.3                            lambda.r_1.2.4                            gson_0.0.9                               
 [34] timechange_0.2.0                          BiocFileCache_2.6.0                       R6_2.5.1                                 
 [37] apeglm_1.20.0                             graphlayouts_0.8.4                        invgamma_1.1                             
 [40] locfit_1.5-9.7                            gridGraphics_0.5-1                        bitops_1.0-7                             
 [43] cachem_1.0.6                              fgsea_1.24.0                              DelayedArray_0.24.0                      
 [46] assertthat_0.2.1                          BiocIO_1.8.0                              scales_1.2.1                             
 [49] ggraph_2.1.0                              enrichplot_1.18.3                         gtable_0.3.1                             
 [52] tidygraph_1.2.2                           xlsx_0.6.5                                rlang_1.0.6                              
 [55] splines_4.2.2                             lazyeval_0.2.2                            yaml_2.3.6                               
 [58] reshape2_1.4.4                            TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2 qvalue_2.30.0                            
 [61] tools_4.2.2                               ggplotify_0.1.0                           ellipsis_0.3.2                           
 [64] gplots_3.1.3                              RColorBrewer_1.1-3                        Rcpp_1.0.9                               
 [67] plyr_1.8.8                                progress_1.2.2                            zlibbioc_1.44.0                          
 [70] purrr_1.0.1                               RCurl_1.98-1.9                            prettyunits_1.1.1                        
 [73] deldir_1.0-6                              viridis_0.6.2                             ashr_2.2-54                              
 [76] cowplot_1.1.1                             chipseq_1.48.0                            ggrepel_0.9.2                            
 [79] magrittr_2.0.3                            futile.options_1.0.1                      TxDb.Hsapiens.UCSC.hg18.knownGene_3.2.2  
 [82] openxlsx_4.2.5.1                          truncnorm_1.0-8                           mvtnorm_1.1-3                            
 [85] SQUAREM_2021.1                            amap_0.8-19                               ProtGenerics_1.30.0                      
 [88] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2   patchwork_1.1.2                           hms_1.1.2                                
 [91] xlsxjars_0.6.1                            HDO.db_0.99.1                             XML_3.99-0.13                            
 [94] VennDiagram_1.7.3                         emdbook_1.3.12                            jpeg_0.1-10                              
 [97] gridExtra_2.3                             testthat_3.1.6                            compiler_4.2.2                           
[100] biomaRt_2.54.0                            bdsmatrix_1.3-6                           tibble_3.1.8                             
[103] shadowtext_0.1.2                          KernSmooth_2.23-20                        crayon_1.5.2                             
[106] R.oo_1.25.0                               htmltools_0.5.4                           ggfun_0.0.9                              
[109] tidyr_1.2.1                               aplot_0.1.9                               lubridate_1.9.0                          
[112] DBI_1.1.3                                 formatR_1.14                              tweenr_2.0.2                             
[115] dbplyr_2.3.0                              MASS_7.3-58.1                             rappdirs_0.3.3                           
[118] boot_1.3-28.1                             ShortRead_1.56.1                          Matrix_1.5-3                             
[121] brio_1.1.3                                cli_3.6.0                                 R.methodsS3_1.8.2                        
[124] parallel_4.2.2                            igraph_1.3.5                              pkgconfig_2.0.3                          
[127] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2   GenomicAlignments_1.34.0                  numDeriv_2016.8-1.1                      
[130] TxDb.Celegans.UCSC.ce6.ensGene_3.2.2      xml2_1.3.3                                ggtree_3.6.2                             
[133] yulab.utils_0.0.6                         stringr_1.5.0                             digest_0.6.31                            
[136] fastmatch_1.1-3                           tidytree_0.4.2                            restfulr_0.0.15                          
[139] GreyListChIP_1.30.0                       curl_5.0.0                                gtools_3.9.4                             
[142] rjson_0.2.21                              jsonlite_1.8.4                            nlme_3.1-161                             
[145] lifecycle_1.0.3                           futile.logger_1.4.3                       viridisLite_0.4.1                        
[148] limma_3.54.0                              BSgenome_1.66.2                           fansi_1.0.3                              
[151] pillar_1.8.1                              lattice_0.20-45                           Nozzle.R1_1.1-1.1                        
[154] plotrix_3.8-2                             KEGGREST_1.38.0                           fastmap_1.1.0                            
[157] httr_1.4.4                                GO.db_3.16.0                              glue_1.6.2                               
[160] zip_2.2.2                                 png_0.1-8                                 bit_4.0.5                                
[163] ggforce_0.4.1                             stringi_1.7.12                            blob_1.2.3                               
[166] TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0 latticeExtra_0.6-30                       caTools_1.18.2                           
[169] memoise_2.0.1                             dplyr_1.0.10                              ape_5.6-2                                
[172] irlba_2.3.5.1

Xenopus_laevis ChIPpeakAnno • 1.3k views

ADD COMMENT • link 3.1 years ago stacy.genovese • 0

score 1 · Accepted Answer · 2023-01-30

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 3 days ago

United States

You are close. But note that the geneIds in your GRanges object are NCBI Gene IDs, which used to be called Entrez Gene IDs, and which are called ENTREZID in the annotation packages

library(org.Xl.eg.db)
E2F4_xl$SYMBOL <- mapIds(org.Xl.eg.db, E2F4_xl$geneId, "SYMBOL", "ENTREZID")

ADD COMMENT • link 3.1 years ago James W. MacDonald 68k

0

Entering edit mode

James - I wish I knew where you lived so I could buy you a beer! Thank you so much for your help. I'm going to work now on converting these genes to human genes. I really can't thank you enough!

ADD REPLY • link 3.1 years ago stacy.genovese • 0