I have generated the differential expression results for my RNAseq data using DEseq2. Based on my research, there are two ways of generating the GSEA preranked list: 1) by log2FC; 2) by p value.
Each metrics suffers from certain shortcomings. For example, genes ranked by log2FC are biased by bigger variance in genes with low counts while genes ranked by p value are biased by genes with higher abundance and longer transcripts.
I have a thought - is it possible to weight log2FC by p value or padj and then generate the GSEA ranked list? For example, gene A and B have the same log2FC but has different p values, A with smaller p value and B with bigger p value. We will then add more weight to gene A than gene B based on their p values. Does this make sense or completely statistically wrong? If it makes sense, what mathematical formula should be used to perform this transformation using log2FC and p value??