Question

Error in gseDO function

0

Entering edit mode

Javier • 0

@8224c1d8

Last seen 13 days ago

Spain

I need some support with the disease ontology enrichment. I'm pretty new to this type of analysis and maybe it's too obvious what I'm asking. From: "DOSE(Yu et al. 2015) supports Disease Ontology (DO) Semantic and Enrichment analysis. The enrichDO function is very useful for identifying disease association of interesting genes, and function gseDO function is designed for gene set enrichment analysis of DO." and here. In broad terms I understood the concept of ORA vs GSEA .

The first object from enrichDO resulting in enrichResult object seems to produce a result. However the 2nd syntax with gseDO in order to perform a gsea is not working. Maybe I am not understanding the required input object to perform. I don't quite understand the difference in technical terms and why the enrichDO produces a valid result, however the gseDO doesn't produce an output, increasing the p-value cutoff to the roof.
The thing, is the result plausible with the data? Or am I missing something in what I'm doing?

Thank you in advance The appearance of the data


kegg_gene_list <-
c(`7042` = 0.365, `10135` = 0.218, `3553` = 0.175, `5291` = 0.167, 
`114548` = 0.163, `22861` = 0.089, `4780` = 0.078, `942` = 0.061, 
`3902` = -0.005, `3586` = -0.011, `1029` = -0.186, `3458` = -0.282
)

Functional analysis.

#enrichDO
edo <- enrichDO(gene          = genes_names,
              ont           = "DO",
              pvalueCutoff  = 0.05,
              pAdjustMethod = "BH",
              minGSSize     = 5,
              maxGSSize     = 500,
              qvalueCutoff  = 0.05,
              readable      = FALSE)

#gseDO
y <- gseDO(kegg_gene_list,
           minGSSize     = 120,
           pvalueCutoff  = 1,
           pAdjustMethod = "BH",
           verbose       = FALSE)

no term enriched under specific pvalueCutoff...


sessionInfo( )
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=Spanish_Spain.utf8  LC_CTYPE=Spanish_Spain.utf8    LC_MONETARY=Spanish_Spain.utf8
[4] LC_NUMERIC=C                   LC_TIME=Spanish_Spain.utf8    

time zone: Europe/Madrid
tzcode source: internal

attached base packages:
 [1] stats4    grid      splines   stats     graphics  grDevices utils     datasets  methods  
[10] base     

other attached packages:
  [1] DOSE_3.28.2            clusterProfiler_4.10.1 org.Hs.eg.db_3.18.0    AnnotationDbi_1.64.1  
  [5] IRanges_2.36.0         S4Vectors_0.40.2       Biobase_2.62.0         BiocGenerics_0.48.1   
  [9] rlang_1.1.2            writexl_1.5.0          mice_3.16.0            gptstudio_0.3.0       
 [13] caret_6.0-94           diffdf_1.0.4           ggord_1.1.7            logspline_2.1.21      
 [17] fitdistrplus_1.1-11    survival_3.5-5         Factoshiny_2.5         FactoInvestigate_1.9  
 [21] shiny_1.8.0            FactoMineR_2.10        textshape_1.7.3        ggrepel_0.9.5         
 [25] missMDA_1.19           factoextra_1.0.7       ggthemes_5.1.0         Rmisc_1.5.1           
 [29] gridExtra_2.3          reactable_0.4.4        stargazer_5.2.3        mediation_4.5.0       
 [33] sandwich_3.1-0         mvtnorm_1.2-4          MASS_7.3-60            creditmodel_1.3.1     
 [37] effects_4.2-2          ggeffects_1.5.0        MCMCglmm_2.35          ape_5.7-1             
 [41] coda_0.19-4.1          dotwhisker_0.8.1       ggpointdensity_0.1.0   gganimate_1.0.9       
 [45] gam_1.22-3             foreach_1.5.2          afex_1.3-1             lme4_1.1-35.1         
 [49] Matrix_1.6-5           forecast_8.22.0        broom.mixed_0.2.9.4    nlme_3.1-162          
 [53] daff_1.1.1             outliers_0.15          styler_1.10.2          datapasta_3.1.0       
 [57] hutils_1.8.1           testthat_3.2.1         repurrrsive_1.1.0      profvis_0.3.8         
 [61] microbenchmark_1.4.10  here_1.0.1             reprex_2.1.0           magrittr_2.0.3        
 [65] xtable_1.8-4           plyr_1.8.9             reshape2_1.4.4         tinytex_0.50          
 [69] broom_1.0.5            lattice_0.21-8         ggpubr_0.6.0           datarium_0.1.0        
 [73] rstatix_0.7.2          DataEditR_0.1.5        akima_0.6-3.4          robustbase_0.99-2     
 [77] reshape_0.8.9          pastecs_1.4.2          mvoutlier_2.1.1        sgeostat_1.0-27       
 [81] car_3.1-2              carData_3.0-5          emmeans_1.10.0         pryr_0.1.6            
 [85] compareGroups_4.8.0    gdata_3.0.0            chron_2.3-61           pgirmess_2.0.3        
 [89] irr_0.84.1             lpSolve_5.6.20         epitools_0.5-10.1      Hmisc_5.1-2           
 [93] data.table_1.15.2      lubridate_1.9.3        forcats_1.0.0          stringr_1.5.1         
 [97] purrr_1.0.2            readr_2.1.5            tidyr_1.3.1            tibble_3.2.1          
[101] ggplot2_3.5.0          tidyverse_2.0.0        foreign_0.8-84         haven_2.5.4           
[105] dplyr_1.1.3            pacman_0.5.1          

loaded via a namespace (and not attached):
  [1] igraph_2.0.3            Formula_1.2-5           zlibbioc_1.48.2        
  [4] bit_4.0.5               tidyselect_1.2.1        doParallel_1.0.17      
  [7] blob_1.2.4              parallel_4.3.1          png_0.1-8              
 [10] ggplotify_0.1.2         cli_3.6.1               bayestestR_0.13.2      
 [13] askpass_1.2.0           openssl_2.1.1           textshaping_0.3.7      
 [16] officer_0.6.5           shadowtext_0.1.3        curl_5.2.1             
 [19] tidytree_0.4.6          mime_0.12               evaluate_0.23          
 [22] V8_4.4.2                stringi_1.8.3           pROC_1.18.5            
 [25] backports_1.4.1         lmerTest_3.1-3          httpuv_1.6.14          
 [28] leaps_3.1               prodlim_2023.08.28      ggraph_2.2.1           
 [31] wk_0.9.1                DT_0.32                 DBI_1.2.2              
 [34] jquerylib_0.1.4         withr_3.0.0             corpcor_1.6.10         
 [37] tseries_0.10-55         class_7.3-22            systemfonts_1.0.6      
 [40] enrichplot_1.22.0       rprojroot_2.0.4         xgboost_1.7.7.1        
 [43] lmtest_0.9-40           brio_1.1.4              tidygraph_1.3.1        
 [46] colourpicker_1.3.0      BiocManager_1.30.23     htmlwidgets_1.6.4      
 [49] fs_1.6.3                DEoptimR_1.1-3          flashClust_1.01-2      
 [52] truncnorm_1.0-9         rhandsontable_0.3.8     zoo_1.8-12             
 [55] XVector_0.42.0          knitr_1.45              timechange_0.3.0       
 [58] fansi_1.0.5             patchwork_1.2.0         ggtree_3.10.1          
 [61] timeDate_4032.109       pan_1.9                 R.oo_1.26.0            
 [64] gridGraphics_0.5-1      ellipsis_0.3.2          lazyeval_0.2.2         
 [67] yaml_2.3.8              crayon_1.5.2            tensorA_0.36.2.1       
 [70] RColorBrewer_1.1-3      tweenr_2.0.3            later_1.3.2            
 [73] gfonts_0.2.0            codetools_0.2-19        base64enc_0.1-3        
 [76] KEGGREST_1.42.0         shape_1.4.6.1           estimability_1.5       
 [79] gdtools_0.3.7           shinyBS_0.61.1          pkgconfig_2.0.3        
 [82] shinyjqui_0.4.1         xml2_1.3.6              aplot_0.2.2            
 [85] scatterplot3d_0.3-44    viridisLite_0.4.2       performance_0.11.0     
 [88] httr_1.4.7              tools_4.3.1             globals_0.16.3         
 [91] hardhat_1.3.1           htmlTable_2.4.2         checkmate_2.3.1        
 [94] HDO.db_0.99.1           shinyjs_2.1.0           assertthat_0.2.1       
 [97] digest_0.6.35           numDeriv_2016.8-1.1     furrr_0.3.1            
[100] farver_2.1.1            tzdb_0.4.0              viridis_0.6.5          
[103] yulab.utils_0.1.4       ModelMetrics_1.2.2.2    rpart_4.1.19           
[106] crul_1.4.0              glue_1.6.2              cachem_1.0.8           
[109] fracdiff_1.5-3          polyclip_1.10-6         Biostrings_2.70.3      
[112] generics_0.1.3          classInt_0.4-10         spdep_1.3-3            
[115] survey_4.4-1            parallelly_1.37.1       R.cache_0.16.0         
[118] ragg_1.3.0              fontBitstreamVera_0.1.1 minqa_1.2.6            
[121] tcltk_4.3.1             glmnet_4.1-8            ggstance_0.3.6         
[124] gson_0.1.0              utf8_1.2.4              gower_1.0.1            
[127] graphlayouts_1.1.1      mitools_2.4             gtools_3.9.5           
[130] datawizard_0.10.0       httpcode_0.3.0          ggsignif_0.6.4         
[133] GenomeInfoDbData_1.2.11 lava_1.8.0              R.utils_2.12.3         
[136] parameters_0.21.6       RCurl_1.98-1.14         memoise_2.0.1          
[139] rmarkdown_2.26          scales_1.3.0            R.methodsS3_1.8.2      
[142] future_1.33.1           svglite_2.1.3           fontLiberation_0.1.0   
[145] rstudioapi_0.15.0       cluster_2.1.4           hms_1.1.3              
[148] cowplot_1.1.3           munsell_0.5.0           colorspace_2.1-0       
[151] jomo_2.7-6              quadprog_1.5-8          Rsolnp_1.16            
[154] GenomeInfoDb_1.38.8     s2_1.1.6                xts_0.13.2             
[157] ipred_0.9-14            shinydashboard_0.7.2    quantmod_0.4.26        
[160] ggforce_0.4.2           xfun_0.42               multcompView_0.1-10    
[163] e1071_1.7-14            urca_1.3-3              TH.data_1.1-2          
[166] recipes_1.0.10          iterators_1.0.14        abind_1.4-5            
[169] GOSemSim_2.28.1         treeio_1.26.0           bitops_1.0-7           
[172] promises_1.2.1          scatterpie_0.2.2        qvalue_2.34.0          
[175] RSQLite_2.3.6           openxlsx_4.2.5.2        fgsea_1.28.0           
[178] proxy_0.4-27            GO.db_3.18.0            compiler_4.3.1         
[181] prettyunits_1.2.0       boot_1.3-28.1           listenv_0.9.1          
[184] Rcpp_1.0.12             fontquiver_0.2.1        units_0.8-5            
[187] progress_1.2.3          BiocParallel_1.36.0     uuid_1.2-0             
[190] insight_0.19.10         cubature_2.1.0          R6_2.5.1               
[193] fastmap_1.1.1           multcomp_1.4-25         fastmatch_1.1-4        
[196] TTR_0.24.4              mitml_0.4-5             nnet_7.3-19            
[199] gtable_0.3.4            KernSmooth_2.23-21      miniUI_0.1.1.1         
[202] deldir_2.0-4            htmltools_0.5.7         bit64_4.0.5            
[205] lifecycle_1.0.4         sf_1.0-15               HardyWeinberg_1.7.5    
[208] zip_2.3.1               kableExtra_1.4.0        spData_2.3.0           
[211] nloptr_2.0.3            sass_0.4.9              vctrs_0.6.4            
[214] flextable_0.9.5         ggfun_0.1.4             sp_2.1-3               
[217] future.apply_1.11.1     bslib_0.6.1             pillar_1.9.0           
[220] jsonlite_1.8.8

DOSE gsea • 307 views

ADD COMMENT • link 29 days ago • updated 27 days ago Javier • 0

score 1 · Answer 1 · 2024-05-27

1

Entering edit mode

Guido Hooiveld ★ 4.0k

@guido-hooiveld-2020

Last seen 19 hours ago

Wageningen University, Wageningen, the …

Yes, you are missing something. :-)

For a GSEA you will need to use as input all genes that you are analyzing; these genes should be ranked on a metric, for example a signed log(p-value). See e.g. here: what the test method for enrichGO in clusterProfiler? or https://yulab-smu.top/biomedical-knowledge-mining-book/enrichment-overview.html

Also, for your current (wrong) GSEA: note that you set the argument minGSSize = 120 when running gseDO, but you only used 12 genes as input... This explains that the message "no term is enriched..." was returned. Again, use a ranked list of all your genes as input, and not just 12.

ADD COMMENT • link 29 days ago Guido Hooiveld ★ 4.0k

0

Entering edit mode

Thank you for your answer!

The thing is I'm working with 12 genes (I did also some targeted cholesterol efflux-related genes, but for sharing the platform under a completely different hypothesis). As you see it's not a typical RNAseq study, it's an RT-qPCR study, with selected genes. However I also count on fold-change values, and I am interested in how they relate to the pathology according to the databases, even though I know from the previous research that are related.

When modifying minGSSize to 5 or 10, it's possible to perform an enrichment analysis. I am interested, specially in the methods, in what I am trying to perform. Therefore I ask for the opinion on the viability of the analysis. How should interpret the result y which in fact comes from just a list of genes without ranked p-value, what am I seeing here is not correct? In case it's not I performed a comparison between groups (control vs treatment) and obtain p-value along with the fold-change values.

In the gseaResult = y, is it possible to focus on one specific disease and visualize how the is the relationship with the disease and the selected genes?

I really appreciated your answer


y <- gseDO(kegg_gene_list,
           minGSSize     = 5,
           pvalueCutoff  = 1,
           pAdjustMethod = "BH",
           verbose       = FALSE)

ADD REPLY • link 29 days ago Javier • 0

0

Entering edit mode

I don't fully understand what you are asking... Yet, in my opinion it doesn't make sense to use the results of only 12 genes for ORA (nor GSEA). Since these genes are 'extremely biased' (i mean, you have selected them for qPCR because of a clear biological interest) these type of analysis will not tell you anything new, only that these genes are involved in processes you already know as biological domain expert.

ADD REPLY • link 28 days ago Guido Hooiveld ★ 4.0k

0

Entering edit mode

As you say it is not a typical experiment of discovery, actually it is carried out in a different way, with a preselected set of genes. We were looking to find outthe "impact" in pathways and to research other genes highly involved in this biological/pathological processes.

What I meant is to select an specific DOID i.e. 2349 = arteriosclerosis, and go deeper into this disease of interest. Once I'm there I would like to see the interaction between the genes of interest in the pathway, and how the interrelate. I'm open to suggestions, as I said I'm pretty new to these tools. I was interested also in the research for other relevant genes in the most significant pathways (atherosclerosis, alzheimer disease...)

ADD REPLY • link 27 days ago Javier • 0