I have some confusion about GAGE workflow. I understand GAGE is a type of functional class scoring tools with no preset cutoff used to identify significant genes. But I have seen several workflow/model scripts where they used the output from DESeq2 which is selected based on p-adjusted value of 0.05. Shouldn't the experimental set be the entire expression data? I guess in brief, I am confused about exactly which two groups are compared in order to extract pathways that are considered disturbed with statistical significance. If I choose to use a subset of genes that are selected as significant as a result of DESeq2 analyses and run GAGE with gsets=kegg.sigmet, what is the comparison made in this case?
kegg_human<-kegg.gsets(species = "hsa", id.type = "kegg")
names(kegg_human)
kegg.sigmet<-kegg_human$kg.sets[kegg_human$sigmet.idx]
Also, what is the key difference in algorithm behind between GSEA and GAGE? I read in papers that GSEA uses Kolmogorov-Smirnov statistics and GAGE uses Wilcoxon Mann-Whitney test. I guess these are both non-parametric ranking tests, but is the difference that GAGE uses two sample t-test based on the ranking while GSEA tests whether the shape of cumulative functions are different?
One last question for the result of GAGE, if I just look at the result without specifying greater or less, the output table shows both greater and less columns but the q-values listed do not match to those when I got separate lists for up/down regulated gene sets. Why is this?
Thank you for your help!