Hi, sorry if this is an overly simple question but I couldn't find a clear answer on the forums or vignette.
I'm running gene set enrichment using the gseGO function in ClusterProfiler. The function needs a list of genes, which I'm planning to rank by log fold change. Should the gene list contain all genes, or should it just contain genes below a significance cut off (e.g. padj < 0.05)? I know some people also rank using something like (signed fold change * -log10pvalue). Should that metric use all genes or just below a significance cut off?
If both inputs (all genes and padj<0.05) are valid, under what circumstances should you use one over the other?
For GSEA (FCS) you should use all genes, not a subset. If you use a subset, then you are performing a over-representation (ORA) analysis. For more info on the differences between the methods (FCS vs ORA) you may want to check the links in this post: Cluster profiler - KEGG analysis