Question

Adding geneID to list of gene locus IDs

0

Entering edit mode

Chruut • 0

@b1c69721

Last seen 3.2 years ago

Switzerland

I got some E. coli K12 sequencing results where the mutated genes were labelled with their gene locus ID (f.i. b0014) and the gene name (f.i. dnaK), but without the gene ID. If I want to do GO for instance with clusterProfiler it seems like all the functions only allow a list of gene ID as inputs. Using gseGO() it didn't work with either using gene names or gene locus IDs. An example of what I tried:

express = runif(10,0,100)
namelist=c("dnaK","dnaJ","hokC","mokC","insB1","insA1","rpsT","insL1","yaaW","yaaI")
names(express)=namelist
express = sort(express)
express = rev(express)
library(clusterProfiler)
BiocManager::install("org.EcK12.eg.db")
gse = gseGO(express, ont="MF",keyType="GENENAME",OrgDb="org.EcK12.eg.db")

There should be a bioconductor function which adds the gene ID to a list of gene locus IDs or gene names I assume. Which one is that?

sessionInfo( )

R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.5.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] clusterProfiler_4.6.0 AnnotationHub_3.6.0   BiocFileCache_2.6.0  
 [4] dbplyr_2.2.1          GOSemSim_2.24.0       GO.db_3.16.0         
 [7] AnnotationDbi_1.60.0  IRanges_2.32.0        S4Vectors_0.36.0     
[10] Biobase_2.58.0        BiocGenerics_0.44.0  

loaded via a namespace (and not attached):
  [1] fgsea_1.24.0                  colorspace_2.0-3             
  [3] ggtree_3.6.2                  gson_0.0.9                   
  [5] ellipsis_0.3.2                qvalue_2.30.0                
  [7] XVector_0.38.0                aplot_0.1.8                  
  [9] farver_2.1.1                  graphlayouts_0.8.3           
 [11] ggrepel_0.9.2                 bit64_4.0.5                  
 [13] scatterpie_0.1.8              interactiveDisplayBase_1.36.0
 [15] fansi_1.0.3                   codetools_0.2-18             
 [17] splines_4.2.2                 cachem_1.0.6                 
 [19] polyclip_1.10-4               jsonlite_1.8.3               
 [21] png_0.1-7                     ggforce_0.4.1                
 [23] shiny_1.7.3                   BiocManager_1.30.19          
 [25] compiler_4.2.2                httr_1.4.4                   
 [27] lazyeval_0.2.2                assertthat_0.2.1             
 [29] Matrix_1.5-3                  fastmap_1.1.0                
 [31] cli_3.4.1                     later_1.3.0                  
 [33] tweenr_2.0.2                  htmltools_0.5.3              
 [35] tools_4.2.2                   igraph_1.3.5                 
 [37] gtable_0.3.1                  glue_1.6.2                   
 [39] GenomeInfoDbData_1.2.9        reshape2_1.4.4               
 [41] dplyr_1.0.10                  rappdirs_0.3.3               
 [43] fastmatch_1.1-3               Rcpp_1.0.9                   
 [45] enrichplot_1.18.0             vctrs_0.5.1                  
 [47] Biostrings_2.66.0             ape_5.6-2                    
 [49] nlme_3.1-160                  ggraph_2.1.0                 
 [51] stringr_1.4.1                 mime_0.12                    
 [53] lifecycle_1.0.3               DOSE_3.24.1                  
 [55] org.Hs.eg.db_3.16.0           zlibbioc_1.44.0              
 [57] MASS_7.3-58.1                 scales_1.2.1                 
 [59] tidygraph_1.2.2               promises_1.2.0.1             
 [61] parallel_4.2.2                RColorBrewer_1.1-3           
 [63] yaml_2.3.6                    curl_4.3.3                   
 [65] memoise_2.0.1                 gridExtra_2.3                
 [67] ggplot2_3.4.0                 downloader_0.4               
 [69] ggfun_0.0.8                   HDO.db_0.99.1                
 [71] yulab.utils_0.0.5             stringi_1.7.8                
 [73] RSQLite_2.2.18                BiocVersion_3.16.0           
 [75] tidytree_0.4.1                filelock_1.0.2               
 [77] BiocParallel_1.32.1           org.EcK12.eg.db_3.16.0       
 [79] GenomeInfoDb_1.34.3           rlang_1.0.6                  
 [81] pkgconfig_2.0.3               bitops_1.0-7                 
 [83] lattice_0.20-45               purrr_0.3.5                  
 [85] treeio_1.22.0                 patchwork_1.1.2              
 [87] shadowtext_0.1.2              cowplot_1.1.1                
 [89] bit_4.0.5                     tidyselect_1.2.0             
 [91] plyr_1.8.8                    magrittr_2.0.3               
 [93] R6_2.5.1                      generics_0.1.3               
 [95] DBI_1.1.3                     pillar_1.8.1                 
 [97] withr_2.5.0                   KEGGREST_1.38.0              
 [99] RCurl_1.98-1.9                tibble_3.1.8                 
[101] crayon_1.5.2                  utf8_1.2.2                   
[103] viridis_0.6.2                 grid_4.2.2                   
[105] data.table_1.14.6             blob_1.2.3                   
[107] digest_0.6.30                 xtable_1.8-4                 
[109] tidyr_1.2.1                   httpuv_1.6.6                 
[111] gridGraphics_0.5-1            munsell_0.5.0                
[113] viridisLite_0.4.1             ggplotify_0.1.0

cluster OrgDb clusterProfiler • 2.7k views

ADD COMMENT • link written 3.2 years ago by Chruut • 0

score 1 · Answer 1 · 2022-11-18

When calling the function gseGO you should set the argument keyType="SYMBOL" (and not to the GENENAME).

(You can ignore the warnings because a very limited data set was analyzed, but it shows gseGO is working fine now).

> library(org.EcK12.eg.db)
> k <- keys(org.EcK12.eg.db)
> 
> # check the content of the annotation database; which column to use
> AnnotationDbi::select(org.EcK12.eg.db, keys=k[1:5], columns = c('ENTREZID', 'SYMBOL', 'GENENAME'), keytype = "ENTREZID")
'select()' returned 1:1 mapping between keys and columns
  ENTREZID SYMBOL
1   944740   yjhR
2   944741   nfrA
3   944742   thrL
4   944743  insB1
5   944744   sspA
                                                        GENENAME
1                                                         pseudo
2 exopolysaccharide secretion system outer membrane protein NfrA
3                                      thr operon leader peptide
4                                        IS1 family protein InsB
5                                 stringent starvation protein A
> 
> gse = gseGO(express, ont="MF",keyType="SYMBOL",OrgDb="org.EcK12.eg.db", pvalueCutoff = 1)
preparing geneSet collections...
GSEA analysis...
no term enriched under specific pvalueCutoff...
Warning message:
In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam,  :
  All values in the stats vector are greater than zero and scoreType is "std", maybe you should switch to scoreType = "pos".
> 
> gse
#
# Gene Set Enrichment Analysis
#
#...@organism    Escherichia coli 
#...@setType     MF 
#...@keytype     SYMBOL 
#...@geneList    Named num [1:10] 94.6 86.8 73.5 66.7 54.3 ...
 - attr(*, "names")= chr [1:10] "yaaI" "hokC" "dnaK" "insA1" ...
#...nPerm        
#...pvalues adjusted by 'BH' with cutoff <1 
#...0 enriched terms found
'data.frame':   0 obs. of  8 variables:
 $ ID             : chr 
 $ Description    : chr 
 $ setSize        : int 
 $ enrichmentScore: num 
 $ NES            : num 
 $ pvalue         : num 
 $ p.adjust       : num 
 $ qvalue         : num 
#...Citation
 T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu.
 clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.
 The Innovation. 2021, 2(3):100141 

>

score 0 · Answer 2 · 2022-12-05

0

Entering edit mode

Pedram • 0

@pedram-24790

Last seen 3.1 years ago

Canada

I have a gene list which are listed as Symbols. So, I prepared the code with ( keyType = "SYMBOL"). However, I receive the erro: preparing geneSet collections... --> Expected input gene ID: RAD54B,MAEL,TERB1,LGR4,PRLR,EREG Error in check_gene_id(geneList, geneSets) : --> No gene can be mapped....

ADD COMMENT • link 3.2 years ago Pedram • 0

0

Entering edit mode

Please don't add questions on to existing questions! Start a new one. Also, please read the FAQ and include enough information that we could help you.

ADD REPLY • link 3.2 years ago James W. MacDonald 68k

score 0 · Answer 3 · 2022-12-15

It's likely to be useful for me since I'm not planning to do an enrichment study but rather to compare the frequency of mutations occurring at distinct locations within a population using data from numerous samples. According to my understanding, the p-Value is irrelevant in my scenario since the fold change is expressed as a percentage. Adding geneID to list of gene locus IDs color by number