I really appreciate the capability of EGSEA to take the hard work out of running multiple gene set scoring algorithms and comprehensively comparing their outputs to find the biologically most meaningful results. However, I would like to get some clarification on the values reported in the Stats Table, so I can be sure I interpret the results correctly.
In generalized terms, EGSEA runs gene set scoring based on logCPM values, generating gene set scores derived from (up to) 12 individual tools/algorithms/packages. Then, based on the gene set scores, differential gene set analysis between experimental conditions is performed (mainly using limma) and for each tool (and comparison) the gene sets are ranked by their p-values. These individual ranks are reported in the right hand part of the EGSEA Stats Table. To find a 'consensus' across the different tools, EGSEA then uses various approaches to combine the individual results into a final 'metric' that can be used to determine the biologically most meaningful gene sets. These approaches are nicely detailed on page 4 in the Alhamdoosh et al 2017 Bioinformatics paper and encompass ways to either combine the individual p-values to a 'consensus' p-value or to combine the individual ranks (average, median, minimum ...). These 'consensus' p-values and ranks are also reported in the Stats Table. Is this correct, so far?
However, I wasn't able to find any details on how the 'avg.logfc', 'avg.logfc.dir' and 'direction' values in the Stats Table are calculated (apart from this forum entry: https://support.bioconductor.org/p/87467/). My first impression was that they are derived from the score differences between the compared experimental conditions as obtained by the differential gene set analysis, and thus, describe the change (direction and magnitude) of the gene set score, but that doesn't seem to be the case? Also, since the absolute values for 'avg.logfc' and 'avg.logfc.dir' differ, they must be derived using two separate algorithms? Thus, would it be possible to please share how these values are generated (in words and with the respective equations)? Additionally, I would find it helpful to also report the score differences as an additional metric in the Stats Table. Thank you!