Hi,
I often use the GSEA
function of the clusterProfiler
package downstream of my differential gene expression analysis. If I understand it correctly, clusterProfiler::GSEA
uses fgsea
under the hood to estimate significance levels and for that, it permutates the gene labels.
However, if I remember correctly, the "original" GSEA implementation (i.e. the one from Subramanian et al.), actually permutates the class labels to preserve gene-gene correlations.
So I was wondering, if there actually is an implementation of the "original" GSEA algorithm that can be called from R. I think the python package gseapy
can do it for example.
Any pointers are much appreciated!
Cheers!
I have never used these implementations myself, but I know the Broad Institute has released one:
https://github.com/GSEA-MSigDB/GSEA_R
Also the Biometrics Research Branch at the National Cancer Institute did so:
https://brb.nci.nih.gov/BRB-ArrayTools/ArrayToolsRPackages.html (bottom of website)
Vignette: https://brb.nci.nih.gov/BRB-ArrayTools/RPackagesAndManuals/GSEA-vignettes.html
Thanks for the links, but note that
The Broad Institute did release R scripts for GSEA, but that was nearly 20 years ago. The scripts haven't been updated in 2005 and are not maintained or supported, see https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/R-GSEA_Readme . I have tried the 2005 R-GSEA scripts but found them so slow and memory hungry as to be essentially unusable. Most importantly, the 2005 Broad Institute R scripts are copyrighted in such a way that prevents Bioconductor package authors from copying or adapting the code from R-GSEA into a new package.
The https://github.com/GSEA-MSigDB/GSEA_R documentation says that it "remains unsupported by the GSEA-MSigDB Team".
BRB-ArrayTools does not make any claim, as far as I can see, that their GSEA tool is equivalent to that published by the Broad Institute. They don't cite any of the Broad Institute papers, which would suggest that it is not exactly equivalent. The BRB-ArrayTools manual recommends the use of the GSA package by Efron and Tibshirani as an improvement on the Broad Institute method.
I just found out about the
romer
function from the limma package and it says that it tests a hypothesis similar to that of Gene Set Enrichment Analysis (GSEA) (Subramanian et al, 2005), so I assume that it also controls for inter gene correlation. Am I right to assume that this is a reasonable alternative to the GSA package?Yes, but I use and recommend camera() or cameraPR() instead of romer().
camera() and romer() both adjust for inter-gene correlations. But I prefer the purely competitive approach of camera() over the (difficult to interpret) combination of competitive and self-contained hypotheses that is tested by GSEA or romer().