Question

GSEA preranked analysis downstream of DEseq2

1

Entering edit mode

EJ ▴ 20

@ej-11019

Last seen 2.8 years ago

USA, Boston, Harvard Medical School

Hi,

I have generated the differential expression results for my RNAseq data using DEseq2. Based on my research, there are two ways of generating the GSEA preranked list: 1) by log2FC; 2) by p value.

Each metrics suffers from certain shortcomings. For example, genes ranked by log2FC are biased by bigger variance in genes with low counts while genes ranked by p value are biased by genes with higher abundance and longer transcripts.

I have a thought - is it possible to weight log2FC by p value or padj and then generate the GSEA ranked list? For example, gene A and B have the same log2FC but has different p values, A with smaller p value and B with bigger p value. We will then add more weight to gene A than gene B based on their p values. Does this make sense or completely statistically wrong? If it makes sense, what mathematical formula should be used to perform this transformation using log2FC and p value??

Thank you!!!

deseq2 gsea rnaseq • 15k views

ADD COMMENT • link updated 23 months ago by ATpoint ★ 4.8k • written 8.6 years ago by EJ ▴ 20

score 4 · Answer 1 · 2016-08-03

4

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

hi EJ,

A few comments:

"genes ranked by log2FC are biased by bigger variance in genes with low counts"

Note that this is not the case for DESeq2 log fold changes -- a unique property of our using Bayesian posterior estimates for LFC. See the DESeq2 paper or vignette, and examine an MA plot.

"while genes ranked by p value are biased by genes with higher abundance and longer transcripts"

For this consideration, you can use goseq following DESeq2. This method is specially designed to address this problem. There are a few posts on the support site on how to use goseq after DESeq2. I haven't had time to do any comparative analysis on the best methods for gene-set testing after DESeq2. I think goseq is the downstream method that I see most often used.

I do like the idea of methods that use the LFC or t-test, and aggregate across the genes in the set, which allows one to detect, at the level of gene set, when there is an abundance of marginal signal for each gene. I haven't had time to implement something for DESeq2 LFCs, although it's something I'm thinking of.

You might take a look also at the ROAST and CAMERA methods which are available in limma.

ADD COMMENT • link 8.6 years ago Michael Love 43k

0

Entering edit mode

I'm running into the exact same question 5.7 years after the original post. Was wondering if there is an update on the advised practice for gene set enrichment downstream of DESeq2?

Thank you!!

ADD REPLY • link 3.0 years ago changxu.fan ▴ 20

0

Entering edit mode

I use goseq.

ADD REPLY • link 3.0 years ago Michael Love 43k

0

Entering edit mode

Would using combining both as a ranking metric via log2FC * -log10(p-value) overcome these shortcomings? Or introduce new ones?

ADD REPLY • link 2.6 years ago hermidalc ▴ 20

1

Entering edit mode

That's fine I guess. I don't know what the meaning of that term is, which is a downside in my opinion.

A posterior effect size is an estimate of an effect, which has a nice interpretation.

ADD REPLY • link 2.6 years ago Michael Love 43k

1

Entering edit mode

Regarding GSEA preranking metric approaches, what I’ve seen, including in this thread, is that many (or most) seem to do either logFC or sign(logFC) * -log10(pval). Both have disadvantages, because they each only look at one of the two important aspects of DE analysis, the biological change in expression between conditions regardless of significance, or the significance of DE between conditions regardless of biological change. And they each have the biases mentioned in the OP. By multiplying both terms, logFC * -log10(pval), seems to produce a better ranking metric than the individual terms and takes into account both important aspects.

ADD REPLY • link 2.6 years ago hermidalc ▴ 20

score 0 · Answer 2 · 2023-04-25

0

Entering edit mode

Alessia • 0

@1b283aa4

Last seen 8 months ago

Spain

Is it better to use pval or adjusted pval?

ADD COMMENT • link 23 months ago Alessia • 0

1

Entering edit mode

pval because padj has many ties

ADD REPLY • link 23 months ago ATpoint ★ 4.8k