Question

what the test method for enrichGO in clusterProfiler?

1

Entering edit mode

xiaofeiwang18266 ▴ 50

@xiaofeiwang18266-13498

Last seen 8 months ago

Singapore

What the test method for enrichGO in clusterProfiler? Is there any parameter to set for this purpose?

I asked this, because I got 0 enriched terms found. But, I got some enriched (look meaningful) by other method. For example, I got some enriched terms by Fisher's exact in TAIR GO enrichment. I'd like to compare the results by different method initially, but got 0 by clusterProfiler.

clusterProfiler • 6.7k views

ADD COMMENT • link 2.3 years ago xiaofeiwang18266 ▴ 50

0

Entering edit mode

Did you have a look at the vignette (help page) of this function? In R type: ?enrichGO.

Did you set (or change or 'play') with the value of the cut-off for significance? So that it is the same as when you did the TAIR GO enrichment analysis? See e.g. Cluster profiler - KEGG analysis

enrichGO performs a so-called over-representation test, which corresponds to a one-sided version of Fisher’s exact test. See also here: https://yulab-smu.top/biomedical-knowledge-mining-book/enrichment-overview.html

ADD REPLY • link 2.3 years ago Guido Hooiveld ★ 3.9k

0

Entering edit mode

@Guido Hooiveld Thanks a lot for quick reply!

By looking at the vignette, I didn't see there is a parameter for setting the test method. So, I assumed it only used one method so-called over-representation test as you referred me.

I am not playing with the values for cut-off. My concerns was the comparison with TAIR GO enrichment at same thresholds to detect the enrichments. For TAIR GO, I used Fisher's exact test with FDR for multiple test correction, which found some GO terms of interest with very high fold change and significant adjusted p-values. But, I did't find any enriched GO at the same cut-off, even I set the thresholds much more liberal using enrichGO. So, my initial thought was if I can change the statistical test method to see if I can get some signals, or if I made some mistake using enrichGO.

One more question is about the options for "pAdjustMethod", I see it is one of "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none". What is the difference in "hochberg", "BH", and "fdr"? Is BH Benjamini-Hochberg? I thought BH is same as FDR.

ADD REPLY • link 2.3 years ago xiaofeiwang18266 ▴ 50

3

Entering edit mode

Several comments:

** clusterProfiler implements 2 types of enrichment tests; A) the before mentioned over-representation tests as well as B) the gene set enrichment analysis (GSEA). Again,see here.

For the ORA test (A) a subset of genes is used as input, and then a one-sided version of Fisher’s exact test will be performed to find enriched GO categories (or KEGG pathways, or...). Functions in clusterProfiler that do this type of analysis include enrichGO, enrichKEGG, enrichDO, enrichWP, enrichPathway, and the universal one enrichr. The last part of the name of the functions indicates which gene set database is used for over-representation analysis (GO=Gene ontology, DOSE=Disease Ontology, WP=WikiPathways etc). Be sure to see the help pages for each function for all details.

For GSEA (B) a ranked list, based on all genes in your dataset, is being used as input. The GSEA algorithm, originally developed by the Broad Institute, and implemented in the R package fgsea, is used to check for gene sets that are enriched on top, or rather on the bottom of the ranked lists. See here for another informative link on GSEA methodology. Functions that do this type of analysis include gseGO, gseKEGG, gseWP, and the universal one GSEA.

** When working with TAIR IDS, be sure to explicitly specify OrgDb = org.At.tair.db. See e.g. here: "No genes can be mapped...." using enrichGO in clusterProfiler.

** Regarding the differences the number of significant GO categories between your approaches, this thread may also be of interest: Gene-GO-term relationship discrepancy between org.Hs.eg.db and geneontology.org

** FDR = false discovery rate. There are multiple ways of calculating an FDR. BH is indeed Benjamini-Hochberg; none means no FDR correction will be performed, etc. Some more background info on the naming of the methods can be found here: p.adjust function fdr and BH. I am not an expert on the various FDR methods, so I cannot recommend on that.... This paper seems to give a nice overview on all methods. FWIW: I always use the Benjamini-Hochberg method.