gseGO input list
Entering edit mode
leakerb • 0
Last seen 2.6 years ago
United States

Hi, sorry if this is an overly simple question but I couldn't find a clear answer on the forums or vignette. I'm running gene set enrichment using the gseGO function in ClusterProfiler. The function needs a list of genes, which I'm planning to rank by log fold change. Should the gene list contain all genes, or should it just contain genes below a significance cut off (e.g. padj < 0.05)? I know some people also rank using something like (signed fold change * -log10pvalue). Should that metric use all genes or just below a significance cut off? If both inputs (all genes and padj<0.05) are valid, under what circumstances should you use one over the other?

clusterProfiler • 3.4k views
Entering edit mode
Guido Hooiveld ★ 4.0k
Last seen 2 days ago
Wageningen University, Wageningen, the …

For GSEA (FCS) you should use all genes, not a subset. If you use a subset, then you are performing a over-representation (ORA) analysis. For more info on the differences between the methods (FCS vs ORA) you may want to check the links in this post: Cluster profiler - KEGG analysis

The default ranking metric for GSEA is the so-called Signal2Noise metric, but obviously other metrics can be used. FYI: since I use limma for my analyses I standardly use its moderated t-values as ranking metric. For more background / food-for-thought on this see the GSEA website at the Broad Institute (, or e.g. this paper.

Also, to perform an ORA (based on Gene Ontology) in clusterProfiler you will need to use the function enrichGO().

Entering edit mode

Very helpful, thank you!


Login before adding your answer.

Traffic: 439 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6