I know there are other questions on this and I've been trying to adapt the answers to my situation but am having problems. I have the following GRanges object stored in a variable, E2F4_xl:
GRanges object with 490 ranges and 15 metadata columns:
seqnames ranges strand | name score signalValue pValue qValue peakSummit annotation
<Rle> <IRanges> <Rle> | <character> <numeric> <numeric> <numeric> <numeric> <integer> <character>
[1] chr1L 2928590-2929203 * | E2F4_xl_peak_1 459 17.41493 51.6746 45.95458 352 Distal Intergenic
[2] chr1L 7693350-7693892 * | E2F4_xl_peak_2 198 13.07112 24.9757 19.88318 276 Intron (XM_018244320..
[3] chr1L 9637849-9638155 * | E2F4_xl_peak_3 102 9.12290 14.9644 10.24909 253 Distal Intergenic
[4] chr1L 21649276-21649760 * | E2F4_xl_peak_4 69 7.07035 11.4591 6.94732 283 Distal Intergenic
[5] chr1L 32643798-32644338 * | E2F4_xl_peak_5 127 10.22822 17.5269 12.70323 257 Distal Intergenic
... ... ... ... . ... ... ... ... ... ... ...
[486] chr9_10S 64286399-64286992 * | E2F4_xl_peak_486 51 6.51636 9.49010 5.11513 422 Distal Intergenic
[487] chr9_10S 65295904-65296737 * | E2F4_xl_peak_487 604 16.52431 66.55741 60.46862 217 Distal Intergenic
[488] chr9_10S 81345129-81345425 * | E2F4_xl_peak_488 35 5.52355 7.81482 3.59655 92 Distal Intergenic
[489] chr9_10S 104113648-104113944 * | E2F4_xl_peak_489 35 5.52355 7.81482 3.59655 78 Intron (XM_018237823..
[490] chr9_10S 104550953-104551249 * | E2F4_xl_peak_490 35 5.52355 7.81482 3.59655 274 Distal Intergenic
geneChr geneStart geneEnd geneLength geneStrand geneId transcriptId distanceToTSS
<integer> <integer> <integer> <integer> <integer> <character> <character> <numeric>
[1] 1 2895509 2909444 13936 2 108697130 XM_018226987.1 -19146
[2] 1 7677015 7709707 32693 1 431959 NM_001091439.1 16335
[3] 1 9687102 9724600 37499 1 108719539 XM_018268459.1 -48947
[4] 1 21687868 21759500 71633 1 108719589 XM_018268528.1 -38108
[5] 1 32681053 32712511 31459 1 373579 XM_018253679.1 -36715
... ... ... ... ... ... ... ... ...
[486] 19 64269336 64272519 3184 2 444022 NM_001092127.1 -13880
[487] 19 65316252 65353996 37745 1 108703023 XM_018238906.1 -19515
[488] 19 80964756 81174142 209387 2 108703109 XM_018239095.1 -170987
[489] 19 104110304 104140692 30389 1 108702352 XM_018237823.1 3344
[490] 19 104374664 104384049 9386 2 108702356 XM_018237827.1 -166904
-------
and I need to get the ENTREZID and/or geneName for each geneId. The above are from Xenopus Laevis.
Right now I've tried:
E2F4_xl_symbols <- AnnotationDbi::select(xlaevis.db, keys = E2F4_xl$geneId, columns = c("ENTREZID", "GENENAME"), keytype = "GENENAME")
but this doesn't make sense because I don't have a geneName column. I just have a geneId columns. But when I do columns(xlaevis.db), none of the options are anything that I have. So I tried:
ncids <- mapIds(org.Xl.eg.db, E2F4_xl$geneIds, "ENTREZID", "geneId")
But this also doesn't make sense for the same reason.
I've also tried to create a TxDb file and use that to annotate the peaks (thanks to James MacDonald for the wonderful help!):
TxDb.xlaevis.USCS <- makeTxDbPackageFromUCSC(version="0.01",
maintainer = "my email",
author = "my email",
genome="xenLae2",
tablename="ncbiRefSeq")
when I run this I get an error that says 'xenLae2' is not a registered USCS genome but from what I can tell it is.
Can anyone help me out? I feel like I'm 90% of the way there and just need a push over the finish line. There must be a way to take the geneId column that I have and get the corresponding geneName/ENTREZID for xenopus laevis. Once I have this I need to convert the genes to human genes but I'm taking this one step at a time.
sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6.2
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.14.6 RMariaDB_1.2.2 bedr_1.0.7
[4] clusterProfiler_4.6.0 ChIPseeker_1.34.1 fastqcr_0.1.2
[7] rtracklayer_1.58.0 org.Hs.eg.db_3.16.0 EnsDb.Hsapiens.v86_2.99.0
[10] ensembldb_2.22.0 AnnotationFilter_1.22.0 xlaevis.db_3.2.3
[13] org.Xl.eg.db_3.16.0 TxDb.Hsapiens.UCSC.hg38.knownGene_3.16.0 GenomicFeatures_1.50.3
[16] AnnotationDbi_1.60.0 Rsamtools_2.14.0 Biostrings_2.66.0
[19] XVector_0.38.0 ChIPQC_1.34.1 BiocParallel_1.32.5
[22] DiffBind_3.8.4 SummarizedExperiment_1.28.0 Biobase_2.58.0
[25] MatrixGenerics_1.10.0 matrixStats_0.63.0 GenomicRanges_1.50.2
[28] GenomeInfoDb_1.34.6 IRanges_2.32.0 S4Vectors_0.36.1
[31] BiocGenerics_0.44.0 ggplot2_3.4.0 BiocManager_1.30.19
loaded via a namespace (and not attached):
[1] utf8_1.2.2 R.utils_2.12.2 tidyselect_1.2.0
[4] RSQLite_2.2.20 htmlwidgets_1.6.1 grid_4.2.2
[7] scatterpie_0.1.8 munsell_0.5.0 codetools_0.2-18
[10] interp_1.1-3 systemPipeR_2.4.0 withr_2.5.0
[13] colorspace_2.0-3 GOSemSim_2.24.0 filelock_1.0.2
[16] rstudioapi_0.14 rJava_1.0-6 DOSE_3.24.2
[19] bbmle_1.0.25 GenomeInfoDbData_1.2.9 mixsqp_0.3-48
[22] hwriter_1.3.2.1 polyclip_1.10-4 bit64_4.0.5
[25] farver_2.1.1 downloader_0.4 treeio_1.22.0
[28] coda_0.19-4 vctrs_0.5.1 TxDb.Rnorvegicus.UCSC.rn4.ensGene_3.2.2
[31] generics_0.1.3 lambda.r_1.2.4 gson_0.0.9
[34] timechange_0.2.0 BiocFileCache_2.6.0 R6_2.5.1
[37] apeglm_1.20.0 graphlayouts_0.8.4 invgamma_1.1
[40] locfit_1.5-9.7 gridGraphics_0.5-1 bitops_1.0-7
[43] cachem_1.0.6 fgsea_1.24.0 DelayedArray_0.24.0
[46] assertthat_0.2.1 BiocIO_1.8.0 scales_1.2.1
[49] ggraph_2.1.0 enrichplot_1.18.3 gtable_0.3.1
[52] tidygraph_1.2.2 xlsx_0.6.5 rlang_1.0.6
[55] splines_4.2.2 lazyeval_0.2.2 yaml_2.3.6
[58] reshape2_1.4.4 TxDb.Dmelanogaster.UCSC.dm3.ensGene_3.2.2 qvalue_2.30.0
[61] tools_4.2.2 ggplotify_0.1.0 ellipsis_0.3.2
[64] gplots_3.1.3 RColorBrewer_1.1-3 Rcpp_1.0.9
[67] plyr_1.8.8 progress_1.2.2 zlibbioc_1.44.0
[70] purrr_1.0.1 RCurl_1.98-1.9 prettyunits_1.1.1
[73] deldir_1.0-6 viridis_0.6.2 ashr_2.2-54
[76] cowplot_1.1.1 chipseq_1.48.0 ggrepel_0.9.2
[79] magrittr_2.0.3 futile.options_1.0.1 TxDb.Hsapiens.UCSC.hg18.knownGene_3.2.2
[82] openxlsx_4.2.5.1 truncnorm_1.0-8 mvtnorm_1.1-3
[85] SQUAREM_2021.1 amap_0.8-19 ProtGenerics_1.30.0
[88] TxDb.Mmusculus.UCSC.mm9.knownGene_3.2.2 patchwork_1.1.2 hms_1.1.2
[91] xlsxjars_0.6.1 HDO.db_0.99.1 XML_3.99-0.13
[94] VennDiagram_1.7.3 emdbook_1.3.12 jpeg_0.1-10
[97] gridExtra_2.3 testthat_3.1.6 compiler_4.2.2
[100] biomaRt_2.54.0 bdsmatrix_1.3-6 tibble_3.1.8
[103] shadowtext_0.1.2 KernSmooth_2.23-20 crayon_1.5.2
[106] R.oo_1.25.0 htmltools_0.5.4 ggfun_0.0.9
[109] tidyr_1.2.1 aplot_0.1.9 lubridate_1.9.0
[112] DBI_1.1.3 formatR_1.14 tweenr_2.0.2
[115] dbplyr_2.3.0 MASS_7.3-58.1 rappdirs_0.3.3
[118] boot_1.3-28.1 ShortRead_1.56.1 Matrix_1.5-3
[121] brio_1.1.3 cli_3.6.0 R.methodsS3_1.8.2
[124] parallel_4.2.2 igraph_1.3.5 pkgconfig_2.0.3
[127] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicAlignments_1.34.0 numDeriv_2016.8-1.1
[130] TxDb.Celegans.UCSC.ce6.ensGene_3.2.2 xml2_1.3.3 ggtree_3.6.2
[133] yulab.utils_0.0.6 stringr_1.5.0 digest_0.6.31
[136] fastmatch_1.1-3 tidytree_0.4.2 restfulr_0.0.15
[139] GreyListChIP_1.30.0 curl_5.0.0 gtools_3.9.4
[142] rjson_0.2.21 jsonlite_1.8.4 nlme_3.1-161
[145] lifecycle_1.0.3 futile.logger_1.4.3 viridisLite_0.4.1
[148] limma_3.54.0 BSgenome_1.66.2 fansi_1.0.3
[151] pillar_1.8.1 lattice_0.20-45 Nozzle.R1_1.1-1.1
[154] plotrix_3.8-2 KEGGREST_1.38.0 fastmap_1.1.0
[157] httr_1.4.4 GO.db_3.16.0 glue_1.6.2
[160] zip_2.2.2 png_0.1-8 bit_4.0.5
[163] ggforce_0.4.1 stringi_1.7.12 blob_1.2.3
[166] TxDb.Mmusculus.UCSC.mm10.knownGene_3.10.0 latticeExtra_0.6-30 caTools_1.18.2
[169] memoise_2.0.1 dplyr_1.0.10 ape_5.6-2
[172] irlba_2.3.5.1
James - I wish I knew where you lived so I could buy you a beer! Thank you so much for your help. I'm going to work now on converting these genes to human genes. I really can't thank you enough!