Entering edit mode
Hello everyone! I'm trying to perform GO and KEGG enrichment analysis. I also followed the advice given in this link: "No genes can be mapped...." using enrichGO in clusterProfiler using enrichGO in clusterProfiler But it does not work, can you help me?
Here the script and the error:
> #import data
> all_genesxlsx <- read_xlsx("D:/R2/Prova/all_genes.xlsx")
> diff_genesxlsx <- read_xlsx("D:/R2/Prova/diff_genes.xlsx")
>
>
> #conversion dataframe in vector of character
> all_genes <- as.character(all_genesxlsx)
> diff_genes <- as.character(diff_genesxlsx)
>
>
> #control
> is.character(all_genes)
[1] TRUE
> is.character(diff_genes)
[1] TRUE
>
>
> # Remove "" an \n
> diff_genes_def <- gsub("[\"\n]", "", diff_genes)
> all_genes_def <- gsub("[\"\n]", "", all_genes)
> print(all_genes_def)
[1] "c(WP_000002828.1, WP_000002852.1, WP_000003115.1, WP_000003406.1, WP_000003532.1, WP_000003544.1, WP_000003555.1, ...)
> print(diff_genes_def)
[1] "c(WP_000002828.1, WP_000002852.1, WP_000003115.1, WP_000003406.1, WP_000003532.1, WP_000003544.1, WP_000003555.1, ...)
> #GO_analysys
> GO_analysis <- enrichGO(gene = diff_genes_def,
+ universe = all_genes_def,
+ OrgDb = org.Abaumannii.eg.db,
+ keyType = "REFSEQ",
+ ont = "CC", # either "BP", "CC" or "MF",
+ pAdjustMethod = "none",
+ pvalueCutoff = 1,
+ qvalueCutoff = 1,
+ readable = FALSE)
--> No gene can be mapped....
--> Expected input gene ID: WP_001075596.1,WP_000959085.1,WP_000220477.1,WP_000776215.1,WP_000091932.1,WP_049581137.1
--> return NULL...
Here the session info:
> sessionInfo()
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale:
[1] LC_COLLATE=Italian_Italy.utf8 LC_CTYPE=Italian_Italy.utf8
[3] LC_MONETARY=Italian_Italy.utf8 LC_NUMERIC=C
[5] LC_TIME=Italian_Italy.utf8
time zone: Europe/Rome
tzcode source: internal
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] readxl_1.4.3 biomaRt_2.58.2 clusterProfiler_4.10.0
[4] org.Abaumannii.eg.db_0.0.1 AnnotationHub_3.10.0 BiocFileCache_2.10.1
[7] dbplyr_2.4.0 BiocManager_1.30.22 AnnotationForge_1.44.0
[10] AnnotationDbi_1.64.1 IRanges_2.36.0 S4Vectors_0.40.2
[13] Biobase_2.62.0 BiocGenerics_0.48.1
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 rstudioapi_0.15.0 jsonlite_1.8.8
[4] magrittr_2.0.3 farver_2.1.1 fs_1.6.3
[7] zlibbioc_1.48.0 vctrs_0.6.5 memoise_2.0.1
[10] RCurl_1.98-1.14 ggtree_3.10.0 htmltools_0.5.7
[13] progress_1.2.3 curl_5.2.0 cellranger_1.1.0
[16] gridGraphics_0.5-1 plyr_1.8.9 cachem_1.0.8
[19] igraph_2.0.1.1 mime_0.12 lifecycle_1.0.4
[22] pkgconfig_2.0.3 Matrix_1.6-5 R6_2.5.1
[25] fastmap_1.1.1 gson_0.1.0 GenomeInfoDbData_1.2.11
[28] shiny_1.8.0 digest_0.6.34 aplot_0.2.2
[31] enrichplot_1.22.0 colorspace_2.1-0 patchwork_1.2.0
[34] RSQLite_2.3.5 filelock_1.0.3 fansi_1.0.6
[37] httr_1.4.7 polyclip_1.10-6 compiler_4.3.2
[40] bit64_4.0.5 withr_3.0.0 BiocParallel_1.36.0
[43] viridis_0.6.5 DBI_1.2.1 ggforce_0.4.1
[46] MASS_7.3-60.0.1 rappdirs_0.3.3 HDO.db_0.99.1
[49] tools_4.3.2 ape_5.7-1 scatterpie_0.2.1
[52] interactiveDisplayBase_1.40.0 httpuv_1.6.14 glue_1.7.0
[55] nlme_3.1-164 GOSemSim_2.28.1 promises_1.2.1
[58] grid_4.3.2 shadowtext_0.1.3 reshape2_1.4.4
[61] fgsea_1.28.0 generics_0.1.3 gtable_0.3.4
[64] tidyr_1.3.1 data.table_1.15.0 hms_1.1.3
[67] xml2_1.3.6 tidygraph_1.3.1 utf8_1.2.4
[70] XVector_0.42.0 ggrepel_0.9.5 BiocVersion_3.18.1
[73] pillar_1.9.0 stringr_1.5.1 yulab.utils_0.1.4
[76] later_1.3.2 splines_4.3.2 dplyr_1.1.4
[79] tweenr_2.0.2 treeio_1.26.0 lattice_0.22-5
[82] bit_4.0.5 tidyselect_1.2.0 GO.db_3.18.0
[85] Biostrings_2.70.2 gridExtra_2.3 graphlayouts_1.1.0
[88] stringi_1.8.3 lazyeval_0.2.2 ggfun_0.1.4
[91] yaml_2.3.8 codetools_0.2-19 ggraph_2.1.0
[94] tibble_3.2.1 qvalue_2.34.0 ggplotify_0.1.2
[97] cli_3.6.1 xtable_1.8-4 munsell_0.5.0
[100] Rcpp_1.0.12 GenomeInfoDb_1.38.5 png_0.1-8
[103] XML_3.99-0.16.1 parallel_4.3.2 ellipsis_0.3.2
[106] ggplot2_3.4.4 blob_1.2.4 prettyunits_1.2.0
[109] DOSE_3.28.2 bitops_1.0-7 viridisLite_0.4.2
[112] tidytree_0.4.6 scales_1.3.0 purrr_1.0.2
[115] crayon_1.5.2 rlang_1.1.3 cowplot_1.1.3
[118] fastmatch_1.1-4 KEGGREST_1.42.0
What do you get from
I get this:
So I do not understand if the file input is wrong, or i the org.package does not contain all the informations. To prepare the input files, I did the following: for
all_genes
, I took my genome and selected the specific RefSeq for each gene. Fordiff_genes
, on the other hand, I took the list of differentially expressed genes and associated each one with the corresponding RefSeq (based on my genome). I did all of this in Excel.Oh, I missed the second n in baumannii. Try with the correct spelling (and note that if you get an error that ends with "object 'org.Abaumanii.eg.db' not found", that's a hint that you probably either have a typo or haven't loaded the package yet).
Don't worry, I hadn't noticed it either.
Here is the result:
Well, that's the problem then. From a cursory check, it appears that things like WP_000002852.1 aren't actually Gene IDs, but instead are NCBI protein IDs.
Yes, I selected them because when I send the command using REFSEQ as the keytype, it asks for this information. For these reasons, I associated this code WP_000002852.1 to each gene
OK, then what do you get for
I get this
If none of the IDs in your all_genes_def object are in your
OrgDb
, then no test can be made.Oh no! Do you think there are problems related to the creation of the orgpackage or the input file?
What should the input file look like? In my case it is just one column with the list of codes.
Probably can I use bitr command? error in getting entrez ids for gene names Or this is not usefull in my case?
The
OrgDb
packages are just SQLite databases that can be used to (for example, in this case) map certain IDs to GO IDs, which can then be used to do a Fisher's exact test.The issue is that you have a set of IDs that you want to map, and when you check to see if any of those IDs are contained in the
OrgDb
, the answer is no. In which case it's impossible to map to the GO IDs, and you therefore cannot do the Fisher's exact test. You need anOrgDb
that actually contains the IDs that you want to map for this to work correctly.Using
bitr
won't help because that function is just meant to simplify querying anOrgDb
. It doesn't 'fix' the problem.Ok thank you, because I did the org.db of A. baumannii 470, as you suggested here Error MakeOrgPackagefromNCBI. I am not sure about the fact that I don't get the input genes wrong, could you point me to an example?
I also made that package. Here are the seven IDs you present in the first post
So from the limited number of IDs I have, only three map to the REFSEQ IDs in the
OrgDb
, and that matches up with the omnibus SQLite Db we already made. This probably has more to do with what data are available at NCBI for this organism than anything else.Hi James, thank you so much for your information and for the test you conducted. Since the issue is related to the information available on NCBI, what do you suggest I do? Because I work with this microorganism. Do you think it's possible to contact NCBI support for further information? These might be basic questions, but I'm completely in the dark, and perhaps you've already encountered similar issues.