Hi,
I have a question on how the pre-ranked list of gene values should be supplied. For example, I take my results of Differential Expression between treatment and control and create a new rank statistic that takes the -log10 of the FDR value and multiplies it by the sign of the log fold change. Then I order the vector from decreasing to increasing values so the downregulated genes are at the top while the upregulated genes are at the bottom. This is shown in the code below:
newRank_pvalueAndFC = -log10(myDEresults$FDR) * sign(myDEresults$logFC)
names(newRank_pvalueAndFC) = rownames(myDEresults)
newRank_pvalueAndFC = newRank_pvalueAndFC[order(newRank_pvalueAndFC)]
In other words, the downregulated genes with a really small fdr value are at the top and the upregulated genes with a really small fdr value are at the bottom. An example of my new rank vector looks like this
Gene1 -5.2
Gene2 -3.1
Gene3 -0.1
Gene4 1.2
Gene5 3.2
The question I have is, does the ordering of the pre ranked list matter. Meaning, the way NES is calculated, do the upregulated genes have to be at the top and downregulated at the bottom so that way if the NES is positive for a pathway it corresponds to the pathway having genes active or upregulated in treatment vs control? Or can the downregulated genes be at the top as I have them and gsea will take the sign of the genes into account and even though they are found at the top, the NES score will be negative for that pathway meaning the pathway has genes enriched that are down in treatment vs control? Also, does fgsea re rank the pre ranked vector I supply it? If it does, what is the benefit of supplying it a pre ranked vector?
I appreciate any help in this as am new to GSEA and using it for enrichment.
Thank you for your reply and I know there is another post similar to this you had mentioned this. How are the genes that I mentioned in the above example ordered, are they just reordered in decreasing order? And does fgsea take into account the sign in the ordered list when calculating NES, or are genes found at top have a positive NES whether they are downregulated or upregulated in DE? Appreciate your help in this.
They are reorder in the decreasing order.
Thank you!
I have a follow up question to this, does fgsea treat positive and negative numbers differently for enrichment score calculation? Or are the numbers simply used for sorting?
The numbers are only used for sorting; thus to create a ranked list, and are then 'ignored'. Thus your 2nd statement.
For background concepts see e.g.https://www.pathwaycommons.org/guide/primers/data_analysis/gsea/
Guido, that's not really true. The numbers are not ignored after ranking, but their absolute values are used in statistic calculation. Otherwise it would be just a Kolmogorov-Smirnov test. The values are only ignored when
gseaParam=0
option is set, as after raising them to 0 power they all become 1, which does turn it into Kolmogorov-Smirnov.The numbers are first used for ranking, then their absolute values are used in calculating GSEA statistic.