Hi,
I have generated the differential expression results for my RNAseq data using DEseq2. Based on my research, there are two ways of generating the GSEA preranked list: 1) by log2FC; 2) by p value.
Each metrics suffers from certain shortcomings. For example, genes ranked by log2FC are biased by bigger variance in genes with low counts while genes ranked by p value are biased by genes with higher abundance and longer transcripts.
I have a thought - is it possible to weight log2FC by p value or padj and then generate the GSEA ranked list? For example, gene A and B have the same log2FC but has different p values, A with smaller p value and B with bigger p value. We will then add more weight to gene A than gene B based on their p values. Does this make sense or completely statistically wrong? If it makes sense, what mathematical formula should be used to perform this transformation using log2FC and p value??
Thank you!!!
I'm running into the exact same question 5.7 years after the original post. Was wondering if there is an update on the advised practice for gene set enrichment downstream of DESeq2?
Thank you!!
I use goseq.
Would using combining both as a ranking metric via
log2FC * -log10(p-value)
overcome these shortcomings? Or introduce new ones?That's fine I guess. I don't know what the meaning of that term is, which is a downside in my opinion.
A posterior effect size is an estimate of an effect, which has a nice interpretation.
Regarding GSEA preranking metric approaches, what I’ve seen, including in this thread, is that many (or most) seem to do either
logFC
orsign(logFC) * -log10(pval)
. Both have disadvantages, because they each only look at one of the two important aspects of DE analysis, the biological change in expression between conditions regardless of significance, or the significance of DE between conditions regardless of biological change. And they each have the biases mentioned in the OP. By multiplying both terms,logFC * -log10(pval)
, seems to produce a better ranking metric than the individual terms and takes into account both important aspects.