Error in gseDO function
Entering edit mode
Javier • 0
Last seen 8 days ago

I need some support with the disease ontology enrichment. I'm pretty new to this type of analysis and maybe it's too obvious what I'm asking. From: "DOSE(Yu et al. 2015) supports Disease Ontology (DO) Semantic and Enrichment analysis. The enrichDO function is very useful for identifying disease association of interesting genes, and function gseDO function is designed for gene set enrichment analysis of DO." and here. In broad terms I understood the concept of ORA vs GSEA .

The first object from enrichDO resulting in enrichResult object seems to produce a result. However the 2nd syntax with gseDO in order to perform a gsea is not working. Maybe I am not understanding the required input object to perform. I don't quite understand the difference in technical terms and why the enrichDO produces a valid result, however the gseDO doesn't produce an output, increasing the p-value cutoff to the roof.
The thing, is the result plausible with the data? Or am I missing something in what I'm doing?

Thank you in advance The appearance of the data

kegg_gene_list <-
c(`7042` = 0.365, `10135` = 0.218, `3553` = 0.175, `5291` = 0.167, 
`114548` = 0.163, `22861` = 0.089, `4780` = 0.078, `942` = 0.061, 
`3902` = -0.005, `3586` = -0.011, `1029` = -0.186, `3458` = -0.282

Functional analysis.

edo <- enrichDO(gene          = genes_names,
              ont           = "DO",
              pvalueCutoff  = 0.05,
              pAdjustMethod = "BH",
              minGSSize     = 5,
              maxGSSize     = 500,
              qvalueCutoff  = 0.05,
              readable      = FALSE)

y <- gseDO(kegg_gene_list,
           minGSSize     = 120,
           pvalueCutoff  = 1,
           pAdjustMethod = "BH",
           verbose       = FALSE)

no term enriched under specific pvalueCutoff...

sessionInfo( )
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default

[1] LC_COLLATE=Spanish_Spain.utf8  LC_CTYPE=Spanish_Spain.utf8    LC_MONETARY=Spanish_Spain.utf8
[4] LC_NUMERIC=C                   LC_TIME=Spanish_Spain.utf8    

time zone: Europe/Madrid
tzcode source: internal

attached base packages:
 [1] stats4    grid      splines   stats     graphics  grDevices utils     datasets  methods  
[10] base     

other attached packages:
  [1] DOSE_3.28.2            clusterProfiler_4.10.1    AnnotationDbi_1.64.1  
  [5] IRanges_2.36.0         S4Vectors_0.40.2       Biobase_2.62.0         BiocGenerics_0.48.1   
  [9] rlang_1.1.2            writexl_1.5.0          mice_3.16.0            gptstudio_0.3.0       
 [13] caret_6.0-94           diffdf_1.0.4           ggord_1.1.7            logspline_2.1.21      
 [17] fitdistrplus_1.1-11    survival_3.5-5         Factoshiny_2.5         FactoInvestigate_1.9  
 [21] shiny_1.8.0            FactoMineR_2.10        textshape_1.7.3        ggrepel_0.9.5         
 [25] missMDA_1.19           factoextra_1.0.7       ggthemes_5.1.0         Rmisc_1.5.1           
 [29] gridExtra_2.3          reactable_0.4.4        stargazer_5.2.3        mediation_4.5.0       
 [33] sandwich_3.1-0         mvtnorm_1.2-4          MASS_7.3-60            creditmodel_1.3.1     
 [37] effects_4.2-2          ggeffects_1.5.0        MCMCglmm_2.35          ape_5.7-1             
 [41] coda_0.19-4.1          dotwhisker_0.8.1       ggpointdensity_0.1.0   gganimate_1.0.9       
 [45] gam_1.22-3             foreach_1.5.2          afex_1.3-1             lme4_1.1-35.1         
 [49] Matrix_1.6-5           forecast_8.22.0        broom.mixed_0.2.9.4    nlme_3.1-162          
 [53] daff_1.1.1             outliers_0.15          styler_1.10.2          datapasta_3.1.0       
 [57] hutils_1.8.1           testthat_3.2.1         repurrrsive_1.1.0      profvis_0.3.8         
 [61] microbenchmark_1.4.10  here_1.0.1             reprex_2.1.0           magrittr_2.0.3        
 [65] xtable_1.8-4           plyr_1.8.9             reshape2_1.4.4         tinytex_0.50          
 [69] broom_1.0.5            lattice_0.21-8         ggpubr_0.6.0           datarium_0.1.0        
 [73] rstatix_0.7.2          DataEditR_0.1.5        akima_0.6-3.4          robustbase_0.99-2     
 [77] reshape_0.8.9          pastecs_1.4.2          mvoutlier_2.1.1        sgeostat_1.0-27       
 [81] car_3.1-2              carData_3.0-5          emmeans_1.10.0         pryr_0.1.6            
 [85] compareGroups_4.8.0    gdata_3.0.0            chron_2.3-61           pgirmess_2.0.3        
 [89] irr_0.84.1             lpSolve_5.6.20         epitools_0.5-10.1      Hmisc_5.1-2           
 [93] data.table_1.15.2      lubridate_1.9.3        forcats_1.0.0          stringr_1.5.1         
 [97] purrr_1.0.2            readr_2.1.5            tidyr_1.3.1            tibble_3.2.1          
[101] ggplot2_3.5.0          tidyverse_2.0.0        foreign_0.8-84         haven_2.5.4           
[105] dplyr_1.1.3            pacman_0.5.1          

loaded via a namespace (and not attached):
  [1] igraph_2.0.3            Formula_1.2-5           zlibbioc_1.48.2        
  [4] bit_4.0.5               tidyselect_1.2.1        doParallel_1.0.17      
  [7] blob_1.2.4              parallel_4.3.1          png_0.1-8              
 [10] ggplotify_0.1.2         cli_3.6.1               bayestestR_0.13.2      
 [13] askpass_1.2.0           openssl_2.1.1           textshaping_0.3.7      
 [16] officer_0.6.5           shadowtext_0.1.3        curl_5.2.1             
 [19] tidytree_0.4.6          mime_0.12               evaluate_0.23          
 [22] V8_4.4.2                stringi_1.8.3           pROC_1.18.5            
 [25] backports_1.4.1         lmerTest_3.1-3          httpuv_1.6.14          
 [28] leaps_3.1               prodlim_2023.08.28      ggraph_2.2.1           
 [31] wk_0.9.1                DT_0.32                 DBI_1.2.2              
 [34] jquerylib_0.1.4         withr_3.0.0             corpcor_1.6.10         
 [37] tseries_0.10-55         class_7.3-22            systemfonts_1.0.6      
 [40] enrichplot_1.22.0       rprojroot_2.0.4         xgboost_1.7.7.1        
 [43] lmtest_0.9-40           brio_1.1.4              tidygraph_1.3.1        
 [46] colourpicker_1.3.0      BiocManager_1.30.23     htmlwidgets_1.6.4      
 [49] fs_1.6.3                DEoptimR_1.1-3          flashClust_1.01-2      
 [52] truncnorm_1.0-9         rhandsontable_0.3.8     zoo_1.8-12             
 [55] XVector_0.42.0          knitr_1.45              timechange_0.3.0       
 [58] fansi_1.0.5             patchwork_1.2.0         ggtree_3.10.1          
 [61] timeDate_4032.109       pan_1.9                 R.oo_1.26.0            
 [64] gridGraphics_0.5-1      ellipsis_0.3.2          lazyeval_0.2.2         
 [67] yaml_2.3.8              crayon_1.5.2            tensorA_0.36.2.1       
 [70] RColorBrewer_1.1-3      tweenr_2.0.3            later_1.3.2            
 [73] gfonts_0.2.0            codetools_0.2-19        base64enc_0.1-3        
 [76] KEGGREST_1.42.0         shape_1.4.6.1           estimability_1.5       
 [79] gdtools_0.3.7           shinyBS_0.61.1          pkgconfig_2.0.3        
 [82] shinyjqui_0.4.1         xml2_1.3.6              aplot_0.2.2            
 [85] scatterplot3d_0.3-44    viridisLite_0.4.2       performance_0.11.0     
 [88] httr_1.4.7              tools_4.3.1             globals_0.16.3         
 [91] hardhat_1.3.1           htmlTable_2.4.2         checkmate_2.3.1        
 [94] HDO.db_0.99.1           shinyjs_2.1.0           assertthat_0.2.1       
 [97] digest_0.6.35           numDeriv_2016.8-1.1     furrr_0.3.1            
[100] farver_2.1.1            tzdb_0.4.0              viridis_0.6.5          
[103] yulab.utils_0.1.4       ModelMetrics_1.2.2.2    rpart_4.1.19           
[106] crul_1.4.0              glue_1.6.2              cachem_1.0.8           
[109] fracdiff_1.5-3          polyclip_1.10-6         Biostrings_2.70.3      
[112] generics_0.1.3          classInt_0.4-10         spdep_1.3-3            
[115] survey_4.4-1            parallelly_1.37.1       R.cache_0.16.0         
[118] ragg_1.3.0              fontBitstreamVera_0.1.1 minqa_1.2.6            
[121] tcltk_4.3.1             glmnet_4.1-8            ggstance_0.3.6         
[124] gson_0.1.0              utf8_1.2.4              gower_1.0.1            
[127] graphlayouts_1.1.1      mitools_2.4             gtools_3.9.5           
[130] datawizard_0.10.0       httpcode_0.3.0          ggsignif_0.6.4         
[133] GenomeInfoDbData_1.2.11 lava_1.8.0              R.utils_2.12.3         
[136] parameters_0.21.6       RCurl_1.98-1.14         memoise_2.0.1          
[139] rmarkdown_2.26          scales_1.3.0            R.methodsS3_1.8.2      
[142] future_1.33.1           svglite_2.1.3           fontLiberation_0.1.0   
[145] rstudioapi_0.15.0       cluster_2.1.4           hms_1.1.3              
[148] cowplot_1.1.3           munsell_0.5.0           colorspace_2.1-0       
[151] jomo_2.7-6              quadprog_1.5-8          Rsolnp_1.16            
[154] GenomeInfoDb_1.38.8     s2_1.1.6                xts_0.13.2             
[157] ipred_0.9-14            shinydashboard_0.7.2    quantmod_0.4.26        
[160] ggforce_0.4.2           xfun_0.42               multcompView_0.1-10    
[163] e1071_1.7-14            urca_1.3-3              TH.data_1.1-2          
[166] recipes_1.0.10          iterators_1.0.14        abind_1.4-5            
[169] GOSemSim_2.28.1         treeio_1.26.0           bitops_1.0-7           
[172] promises_1.2.1          scatterpie_0.2.2        qvalue_2.34.0          
[175] RSQLite_2.3.6           openxlsx_4.2.5.2        fgsea_1.28.0           
[178] proxy_0.4-27            GO.db_3.18.0            compiler_4.3.1         
[181] prettyunits_1.2.0       boot_1.3-28.1           listenv_0.9.1          
[184] Rcpp_1.0.12             fontquiver_0.2.1        units_0.8-5            
[187] progress_1.2.3          BiocParallel_1.36.0     uuid_1.2-0             
[190] insight_0.19.10         cubature_2.1.0          R6_2.5.1               
[193] fastmap_1.1.1           multcomp_1.4-25         fastmatch_1.1-4        
[196] TTR_0.24.4              mitml_0.4-5             nnet_7.3-19            
[199] gtable_0.3.4            KernSmooth_2.23-21      miniUI_0.1.1.1         
[202] deldir_2.0-4            htmltools_0.5.7         bit64_4.0.5            
[205] lifecycle_1.0.4         sf_1.0-15               HardyWeinberg_1.7.5    
[208] zip_2.3.1               kableExtra_1.4.0        spData_2.3.0           
[211] nloptr_2.0.3            sass_0.4.9              vctrs_0.6.4            
[214] flextable_0.9.5         ggfun_0.1.4             sp_2.1-3               
[217] future.apply_1.11.1     bslib_0.6.1             pillar_1.9.0           
[220] jsonlite_1.8.8
DOSE gsea • 294 views
Entering edit mode
Guido Hooiveld ★ 4.0k
Last seen 1 day ago
Wageningen University, Wageningen, the …

Yes, you are missing something. :-)

For a GSEA you will need to use as input all genes that you are analyzing; these genes should be ranked on a metric, for example a signed log(p-value). See e.g. here: what the test method for enrichGO in clusterProfiler? or

Also, for your current (wrong) GSEA: note that you set the argument minGSSize = 120 when running gseDO, but you only used 12 genes as input... This explains that the message "no term is enriched..." was returned. Again, use a ranked list of all your genes as input, and not just 12.

Entering edit mode

Thank you for your answer!

The thing is I'm working with 12 genes (I did also some targeted cholesterol efflux-related genes, but for sharing the platform under a completely different hypothesis). As you see it's not a typical RNAseq study, it's an RT-qPCR study, with selected genes. However I also count on fold-change values, and I am interested in how they relate to the pathology according to the databases, even though I know from the previous research that are related.

When modifying minGSSize to 5 or 10, it's possible to perform an enrichment analysis. I am interested, specially in the methods, in what I am trying to perform. Therefore I ask for the opinion on the viability of the analysis. How should interpret the result y which in fact comes from just a list of genes without ranked p-value, what am I seeing here is not correct? In case it's not I performed a comparison between groups (control vs treatment) and obtain p-value along with the fold-change values.

In the gseaResult = y, is it possible to focus on one specific disease and visualize how the is the relationship with the disease and the selected genes?

I really appreciated your answer

y <- gseDO(kegg_gene_list,
           minGSSize     = 5,
           pvalueCutoff  = 1,
           pAdjustMethod = "BH",
           verbose       = FALSE)
Entering edit mode

I don't fully understand what you are asking... Yet, in my opinion it doesn't make sense to use the results of only 12 genes for ORA (nor GSEA). Since these genes are 'extremely biased' (i mean, you have selected them for qPCR because of a clear biological interest) these type of analysis will not tell you anything new, only that these genes are involved in processes you already know as biological domain expert.

Entering edit mode

As you say it is not a typical experiment of discovery, actually it is carried out in a different way, with a preselected set of genes. We were looking to find outthe "impact" in pathways and to research other genes highly involved in this biological/pathological processes.

What I meant is to select an specific DOID i.e. 2349 = arteriosclerosis, and go deeper into this disease of interest. Once I'm there I would like to see the interaction between the genes of interest in the pathway, and how the interrelate. I'm open to suggestions, as I said I'm pretty new to these tools. I was interested also in the research for other relevant genes in the most significant pathways (atherosclerosis, alzheimer disease...)


Login before adding your answer.

Traffic: 379 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6