fgsea usage question - what gene set to use?
1
0
Entering edit mode
@alexandrgopanenko-11598
Last seen 10 months ago

Hi!

I want to ask the question regarding the usage of the pre-ranked gene set.

E.g. I have the results of differential expression analysis results obtained with DESeq2 or edgeR. Thus, I have results table with such metrics with LFC, p.adf (or FDR) and etc.

I wonder should I use in fgsea analysis all genes with estimated LFC values (from results table) or is it better to restrict set by p.adj (or FDR) < 0.05 and use only differentially expressed genes?

Thanks for helping!

Best wishes, Alexandr

fgsea • 515 views
2
Entering edit mode
ATpoint ▴ 970
@atpoint-13662
Last seen 15 hours ago
Germany

I am not the fgsea developer, but my two cents: You should use all genes, or at least all relevant genes. In DESeq2 that might be the genes surviving the independent filtering (=not being NA) or in edgeR those that survive filterByExpr. GSEA tests whether a gene set as a whole (rather than individual genes as we test in a pairwise comparison with the mentioned tools) show evidence to be over- or underexpressed. A geneset can (as a whole) show evidence to be overexpressed even though each gene individually does not need to be overexpressed (=being significant) in a pairwise comparison. It is simply two different types of questions one asks when using pairwise DE testing and GSEA. For DESeq2 I would therefore use all genes surviving the independent filtering, e.g. ranked by moderated and shrunken LFC after applying lfcShrink. As we rank genes for GSEA we obviously lose the information of the magnitude of the ranking metric (here the fold changes) so GSEA informs about global tendencies. I think it makes sense to always pair GSEA results with other information, like the fold changes from DESeq2. Even if your GSEA is significant, but it turns out that the fold changes of your DESeq2 analysis for the genes of that particular pathway you are fgsea-ing against are tiny (like very close to zero), then it is probably questionable whether the result is biologically meaningful, even though in GSEA rank space the analysis was significant. But I think the practice of combining different analysis methods to make a confident statements always makes sense, not just in the GSEA context. Does that make sense to you?

0
Entering edit mode

I did my analysis as you described. I want to be confident that I do it in the right way that's why I asked the community about this question.