GESA with certain gene list returns no result with human genom, but many result with mouse genom.
Entering edit mode
sawa • 0
Last seen 23 days ago

Hello. Today, I am trying to perform a GSEA on a gene list deriving a human. The code and the result are here.

> HuSplCD103nEqCD103pEq_GSEA <- gseGO(geneList = HuSplCD103nEqCD103pEqRanking, 
+                                     OrgDb = "", ont = "BP", keyType = "SYMBOL", pAdjustMethod="none")
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).

preparing geneSet collections...
GSEA analysis...
no term enriched under specific pvalueCutoff...

I got nothing. Therefore, I also tried, because I recently performed a similar analysis on mouse datasets and got a result with more than 1000 categories.

> HuSplCD103nEqCD103pEq_GSEA <- gseGO(geneList = HuSplCD103nEqCD103pEqRanking, 
+                                     OrgDb = "", ont = "BP", keyType = "SYMBOL", pAdjustMethod="none")
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).

preparing geneSet collections...
GSEA analysis...
leading edge analysis...
There were 23 warnings (use warnings() to see them)

The result contained 817 categories. What caused this difference? Is simply GSEA information concerning human genes far less than mouse genes? Now, I am tempted to deploy on the human genelist. But isn't it inappropriate to use with a human genelist?

> sessionInfo()
R version 4.4.0 (2024-04-24 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)

Matrix products: default

[1] LC_COLLATE=Japanese_Japan.utf8  LC_CTYPE=Japanese_Japan.utf8    LC_MONETARY=Japanese_Japan.utf8
[4] LC_NUMERIC=C                    LC_TIME=Japanese_Japan.utf8    

time zone: Asia/Tokyo
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocManager_1.30.23         tibble_3.2.1                shiny_1.8.1.1              
 [5] ggsignif_0.6.4              data.table_1.15.4           loupeR_1.1.0                SingleR_2.6.0              
 [9] dplyr_1.1.4                 Seurat_5.1.0                SeuratObject_5.0.2          sp_2.1-4                   
[13]         ggplot2_3.5.1               enrichplot_1.24.0           openxlsx_4.2.5.2           
[17] clusterProfiler_4.12.0      limma_3.60.2               AnnotationDbi_1.66.0       
[21] readr_2.1.5                 SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0 Biobase_2.64.0             
[25] GenomicRanges_1.56.0        GenomeInfoDb_1.40.0         IRanges_2.38.0              S4Vectors_0.42.0           
[29] BiocGenerics_0.50.0         MatrixGenerics_1.16.0       matrixStats_1.3.0          

loaded via a namespace (and not attached):
  [1] fs_1.6.4                  spatstat.sparse_3.0-3     HDO.db_0.99.1             httr_1.4.7               
  [5] RColorBrewer_1.1-3        tools_4.4.0               sctransform_0.4.1         utf8_1.2.4               
  [9] R6_2.5.1                  lazyeval_0.2.2            uwot_0.2.2                withr_3.0.0              
 [13] gridExtra_2.3             progressr_0.14.0          textshaping_0.3.7         cli_3.6.2                
 [17] spatstat.explore_3.2-7    fastDummies_1.7.3         scatterpie_0.2.3          labeling_0.4.3           
 [21] sass_0.4.9                spatstat.data_3.0-4       ggridges_0.5.6            pbapply_1.7-2            
 [25] systemfonts_1.1.0         yulab.utils_0.1.4         gson_0.1.0                DOSE_3.30.1              
 [29] parallelly_1.37.1         rstudioapi_0.16.0         RSQLite_2.3.6             generics_0.1.3           
 [33] gridGraphics_0.5-1        ica_1.0-3                 spatstat.random_3.2-3     vroom_1.6.5              
 [37] zip_2.3.1                 GO.db_3.19.1              Matrix_1.7-0              fansi_1.0.6              
 [41] abind_1.4-5               lifecycle_1.0.4           qvalue_2.36.0             SparseArray_1.4.3        
 [45] Rtsne_0.17                grid_4.4.0                blob_1.2.4                promises_1.3.0           
 [49] crayon_1.5.2              miniUI_0.1.1.1            lattice_0.22-6            beachmat_2.20.0          
 [53] cowplot_1.1.3             KEGGREST_1.44.0           pillar_1.9.0              fgsea_1.30.0             
 [57] future.apply_1.11.2       codetools_0.2-20          fastmatch_1.1-4           leiden_0.4.3.1           
 [61] glue_1.7.0                ggfun_0.1.5               vctrs_0.6.5               png_0.1-8                
 [65] treeio_1.28.0             spam_2.10-0               gtable_0.3.5              cachem_1.1.0             
 [69] S4Arrays_1.4.0            mime_0.12                 tidygraph_1.3.1           survival_3.5-8           
 [73] statmod_1.5.0             fitdistrplus_1.1-11       ROCR_1.0-11               nlme_3.1-164             
 [77] ggtree_3.12.0             bit64_4.0.5               RcppAnnoy_0.0.22          bslib_0.7.0              
 [81] irlba_2.3.5.1             KernSmooth_2.23-22        colorspace_2.1-0          DBI_1.2.3                
 [85] DESeq2_1.44.0             tidyselect_1.2.1          bit_4.0.5                 compiler_4.4.0           
 [89] hdf5r_1.3.10              DelayedArray_0.30.1       plotly_4.10.4             shadowtext_0.1.3         
 [93] scales_1.3.0              lmtest_0.9-40             stringr_1.5.1             digest_0.6.35            
 [97] goftest_1.2-3             spatstat.utils_3.0-4      XVector_0.44.0            htmltools_0.5.8.1        
[101] pkgconfig_2.0.3           sparseMatrixStats_1.16.0  fastmap_1.2.0             rlang_1.1.3              
[105] htmlwidgets_1.6.4         UCSC.utils_1.0.0          DelayedMatrixStats_1.26.0 farver_2.1.2             
[109] jquerylib_0.1.4           zoo_1.8-12                jsonlite_1.8.8            BiocParallel_1.38.0      
[113] GOSemSim_2.30.0           BiocSingular_1.20.0       magrittr_2.0.3            GenomeInfoDbData_1.2.12  
[117] ggplotify_0.1.2           dotCall64_1.1-1           patchwork_1.2.0           munsell_0.5.1            
[121] Rcpp_1.0.12               ape_5.8                   ggnewscale_0.4.10         viridis_0.6.5            
[125] reticulate_1.36.1         stringi_1.8.4             ggraph_2.2.1              zlibbioc_1.50.0          
[129] MASS_7.3-60.2             plyr_1.8.9                parallel_4.4.0            listenv_0.9.1            
[133] ggrepel_0.9.5             deldir_2.0-4              Biostrings_2.72.0         graphlayouts_1.1.1       
[137] splines_4.4.0             tensor_1.5                hms_1.1.3                 locfit_1.5-9.9           
[141] igraph_2.0.3              spatstat.geom_3.2-9       RcppHNSW_0.6.0            reshape2_1.4.4           
[145] ScaledMatrix_1.12.0       tzdb_0.4.0                tweenr_2.0.3              httpuv_1.6.15            
[149] RANN_2.6.1                tidyr_1.3.1               purrr_1.0.2               polyclip_1.10-6          
[153] future_1.33.2             scattermore_1.2           ggforce_0.4.2             rsvd_1.0.5               
[157] xtable_1.8-4              RSpectra_0.16-1           tidytree_0.4.6            later_1.3.2              
[161] ragg_1.3.2                viridisLite_0.4.2         snow_0.4-4                aplot_0.2.2              
[165] memoise_2.0.1             cluster_2.1.6             globals_0.16.3
clusterProfiler • 468 views
Entering edit mode
Last seen 2 hours ago
United States

Ideally you would use something better than gene symbols for this. I realize that biologists like gene symbols, but they are really terrible, not being unique or constrained in any way. At least things like NCBI or Ensembl Gene IDs are assigned by a central authority that tries for uniqueness and identifiability.

That said, do note that human gene symbols are all caps and mouse have only the first letter capitalized. It's highly likely that your gene symbols follow the latter convention, which is why you are getting mappings for mouse and not human.

Entering edit mode

Thank you so much! I did not know the rule that human genes are written in upper case and mouse in first letter capitalized format. When I saw such differences in papers, I thought those were determined arbitrarily by researchers. I also converted the dataset being analyzed to mouse format. Now, I have a reasonable GSEA result.

And next time, I will use ENTREZID primarily.


Login before adding your answer.

Traffic: 645 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6